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A  f  • 

_  ABSTRACT  ^ 

This  report  is  the  second  part  of  an  investigation  of  the  so  called  *  Dominant  Label 
Selector”,  (DLS)  which  is  the  rear  stage  of  a  novel  associative  memory  called  "Selective 
Reflexive  Memory”  (SRM).  The  front  stage  of  the  SRM  consists  of  a  Bidirectional  Linear 
Transformer  (BLT)  the  output  of  which  is  processed  by  the  DLS.  The  BLT  transforms  a 
bipolar  input  x  into  a  linear  combination  of  Hadamard  vectors,  £md  the  task  of  the  DLS  is 
to  select  the  dominant  Hadamard  vector  from  this  linear  combination.  This  vector  is  then 
returned  to  the  BLT  for  a  back  stroke,  which  produces  the  stored  vector  closest  to  x.  An 
attractive  choice  of  DLS  is  the  "Quadratic  Hadamard  Memory”,  which  employs  quadratic 
activations,  and  stores  Hadamard  vectors.  Previously,  this  DLS  was  investigated  by  means 
of  the  asynchronous  discrete  model.  In  the ‘present  report  the  investigation  of  the  Quadratic 
Hadamard  Memory  is  extended  to  the  continuous  model  in  which  input  capacitance  and 
resistance  of  amplifiers  is  accounted  for,  and  the  coupling  between  BLT  and  DLS  can  be 

studied.  '£  _ - 

\ 


A  Liapunov  function  ("energy”)  is  constructed,  and  it  follows  that  the  DLS  is 
stable.  Sufficient  conditions  for  instability  of  stationary  states  are  derived  from  the  energy 
and  also  from  the  equations  of  motion,  in  terms  of  the  divergence  of  the  flow  in  activation 
space.  The  energy  landscape  is  explored  for  the  case  of  maximum  symmetry,  i.e.,  for  zero 
thresholds.  We  find  a  small  central  crater  with  an  undulated  ridge.  Gullies  run  in  the  radial 
direction,  over  the  ridge,  and  down  the  outer  slopes,  toward  the  Hadamard  points.  The 
deepest  gullies  are  those  directed  towards  a  Hadamard  point.  The  stationary  points  on  the 
ridge  are  unstable  and  found  to  have  principal  Hadamard  spectra,  i.e.,  their  signals  are 
proportional  to  the  sum  of  m  Hadamard  vectors.  For  m=l,  the  signal  is  proportional  to  a 
single  Hadamard  vector  (the  spectrum  is  "pure”).  For  this  case,  and  also  for  principal 
spectra  with  m=2,  the  signal  path  is  a  radial  line.  For  principal  spectra  with  larger  m,  the 
path  curves  in  the  region  where  the  neuron  output  function  is  nonlinear.  The  dynamics  is 
decomposed  into  longitudinal  and  transverse  parts.  This  decomposition  leads  to  an 
adiabatic  fake  dynamics,  in  which  the  signal  is  constraint  on  a  hypersphere  H^,  and  the 

longitudinal  dynamics  is  omitted.  We  let  the  signal  find  its  transverse  equilibrium  on  Hj^ 

before  going  to  the  next  hypersphere  Hr+^R  .  The  succession  of  transverse  equilibria 

forms  the  transverse  adiabatic  path.  This  path  is  found  to  link  the  stationary  points  of  the 
true  dynamics  with  signal  points  that  have  principal  spectra  in  the  proportional  region,  i.e., 
in  the  region  where  the  signals  are  proportional  to  the  activations,  either  exactly  or 
approximately.  It  is  found  that  for  thresholds  with  principal  spectra,  the  signal  spectrum  is 
conserved  in  the  proportional  region.  If  the  BLT  output  u  is  applied  to  the  DLS  as  external 
coupling,  and  a  certain  large  uniform  threshold  term  is  added,  then  the  DLS  has  as  only 
bipolar  stationary  points  the  Hadamard  points.  However,  the  large  uniform  threshold  term 
spoils  the  early  dynamics,  by  pushing  the  signal  point  out  of  the  gully  that  runs  to  the 
Hadamard  point  that  is  dominant  in  the  BLT  output  u.  To  avoid  this  Rom  happening,  the 
large  uniform  threshold  term  is  omitted,  but  then,  spurious  stable  states  are  let  back  in.  It 
is  shown  however  that  such  spurious  states  are  dynamically  inaccessible  if  the  external 
coupling  constant  is  chosen  properly,  and  the  gain  is  large  enough.  This  is  shown  in  a 
tedious  analysis  which  circumvents  the  need  to  integrate  the  N  coupled  nonlinear 
differential  equations,  something  I  cannot  do.  In  the  proportional  region  these  equations 
can  be  integrated  in  spite  of  the  nonlinear  selfcoupling  (i.e.,  the  quadratic  activation).  For 
small  coupling  constant,  the  signals  in  this  region  undergo  spectral  purification,  which  can 
be  made  as  large  as  desired  by  choosing  the  gain  large  enough.  In  this  purification  the 
dominant  Hadamard  component  in  the  signal  becomes  even  more  dominant  as  time  goes 
on.  After  the  leaving  the  proportional  region,  a  final  purification  takes  place  which  makes 
the  spectrum  pure,  i.e.,  the  signal  becomes  a  single  Hadamard  vector.  Thus  we  have  a  proof 
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that  a  DLS  of  dimension  N  (needs  to  be  a  power  of  2,  but  is  otherwise  arbitrary), 
externally  coupled  to  a  BLT,  will  provide  perfect  associative  recall  of  N  stored  vectors,  if 
the  coupling  constant  is  chosen  properly,  and  the  gain  is  large  enough.  Large  gains  are 
desirable,  but  the  coupling  constants  specified  in  the  theorem  are  much  too  small  for 
practical  applications. 

Numerical  computations  have  been  performed  for  SRMs  of  dimension  N=8  and  16, 
for  large  gains  and  practical  values  of  the  coupling  constant.  Perfect  associative  recall  was 
found  of  N  random  stored  bipolar  vectors,  for  any  bipolar  input  with  a  unique  nearest 
stored  vector. 

Because  of  the  length  of  the  paper,  we  show  an  overview  of  the  sections  and  the 
theorems,  stated  in  abbreviated  form,  for  the  purpose  of  orientation  only. 

INTRODUCTION 

SELECTIVE  REFLEXIVE  MEMORY  (SRM) 

DOMINANT  LABEL  SELECTOR  (DLS) 

CONTINUOUS  MODEL  FOR  DLS 
NEURON  OUTPUT  FUNCTIONS 
THE  ENERGY 
Theorem  1:  The  DLS  is  stable. 


SPECIFIC  FORMS  OF  EQUATIONS  OF  MOTION  AND  ENERGY 
COUPLING  SCHEMES 
DYNAMIC  REGIONS 
STATIONARY  POINTS  OF  THE  IN 

2 

Theorem  2:  For  thresholds  r  =N  — 4N  +  external  coupling,  the  only  stationary 

(t 

points  of  the  1^  are  the  Hadamard  points. 

STABILITY 

Theorem  3:  Any  stable  point  of  the  DLS  must  either  lie  in  the  region  y^<l/2  or  in 
the  corners  of  the  solid  hypercube. 

2 

Theorem  4:  For  a  DLS  with  threshold  bound  of  N  /4  there  is  a  stationary  point  in 
every  Hadamard  corner. 

Theorem  5:  The  stationary  point  of  Theorem  4  is  stable. 

ENERGY  LANDSCAPE 

Theorem  6:  For  zero  thresholds,  the  origin  in  signal  space  is  stable. 

Theorem  7:  For  zero  thresholds,  the  stationary  states  in  the  proportional  region,  and 
away  from  the  origin,  are  unstable. 

Theorem  8:  The  ridge  set  has  principal  spectra. 

Theorem  9:  For  zero  thresholds,  the  stationary  points  of  the  energy  function 
constraint  to  a  hypersphere,  in  the  proportional  region,  have  ^principal  spectra. 

DECOMPOSITION  INTO  LONGITUDINAL  AND  TRANSVERSE  DYNAMICS 
ADIABATIC  FAKE  DYNAMICS 

Theorem  10:  Through  every  stationary  point  of  the  true  DLS  dynamics  goes  a 
transverse  adiabatic  path,  along  which,  in  the  proportional  region,  the  transverse 
force  vanishes. 

Theorem  IT.  For  zero  threshold,  the  transverse  equilibria  in  the  proportional  region 
have  iprincipal  spectra. 
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CONSERVATION  OF  PRINCIPAL  SPECTRA 

Theoreml2:  For  a  DLS  for  which  the  thresholds  have  a  principal  spectrum,  the 
signal  spectrum  is  conserved  in  the  proportional  region. 

SPURIOUS  STATES  SHIELD  SPOILS  EARLY  DYNAMICS 

DYNAMICS  IN  THE  PROPORTIONAL  REGION 

SPECTRAL  PURIFICATION  IN  THE  PROPORTIONAL  REGION 

2 

Theorem  13:  For  a  DLS  externally  coupled  to  a  BLT  by  r  =/i(u0+N  6 and  with 

^  oL  a  cLI 

2  2 

coupling  constant  /r=  l/(4g  N  ),  the  dominance  ratio  can  be  made  arbitrarily  large 
by  taking  either  the  gain  or  the  dimension  large  enough. 

FINAL  PURIFICATION 

Theorem  14:  For  a  DLS  with  coupling  and  gain  as  in  Theorem  13,  and  with 
activations  reset  to  zero  at  the  time  of  application  of  the  BLT  output  u,  the  signal  y 
settles  at  the  dominant  Hadamard  vector  in  u,  if  the  gain  is  large  enough. 

NUMERICAL  COMPUTATIONS 
CONCLUSION 


INTRODUCTION 

The  main  problems  of  concern  in  associative  memories  are  early  saturation,  fault 
sensitivity,  and  hardware  implementation.  Hopfield  memories  [1]  are  robust,  but  suffer 
from  early  saturation.  The  latter  problem  is  solved  by  ART  [2],  but  at  the  cost  of  fault 
sensitivity  of  the  upper  layer.  Early  saturation  can  also  be  circumvented  by  employing 
vectors  with  dilute  information  [3],  but  this  approach  is  wasteful  of  memory  dimension. 
The  use  of  connection  matrices  that  are  more  sophisticated  than  the  Hopfield  matrix  [1] 
also  may  solve  this  problem  [4,5],  but  at  a  loss  of  locality  of  the  learning  rule,  with 
unacceptable  consequences  for  hardware  implementations  in  applications  with  large 
dimension.  The  use  of  Coulomb  like  activations  [6]  makes  it  possible  to  load  up  associative 
memories  to  great  density,  but  it  forces  the  individual  neurons  to  be  rather  complicated, 
with  undesirable  consequences  for  hardware  implementations.  Making  memories 
bidirectional  [7]  does  not  give  relief  of  early  saturation  [8]. 

In  Phase  I  of  the  present  DARPA  SBIR  project  we  outlined  a  new  approach  to 
associative  memories  which  appears  to  have  the  potential  of  overcoming  the  early 
saturation  problem,  while  retaining  fault  tolerance.  The  approach  involves  a  two— stage 
memory,  shown  schematically  in  Fig.  1. 

In  Phase  II  of  the  project,  a  promising  architecture  for  the  rear  stage  was  identified 
and  investigated.  The  device  is  an  associative  memory  with  Hadamard  vectors  as  stored 
states,  and  with  the  activation  taken  as  a  quadratic  function  of  the  incoming  neuron 
signals,  instead  of  the  customary  linear  function.  Using  a  quadratic  activation  of  course 
carries  a  penalty  in  hardware  implementation,  and  it  may  lead  to  proliferation  of  the 
number  of  connections.  The  latter  is  not  found  to  be  a  problem,  as  the  number  of 
connections  required  is  about  the  same  as  for  a  fully  connected  Hopfield  memory.  The 
hardware  complication  due  to  quadratic  activations  appears  to  be  rather  mild  as  compared 
to  that  due  to  Coulomb  like  activations. 

The  resulting  "Quadratic  Hadamard  Memory"  was  investigated  with  the 
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asynchronous  discrete  model,  as  discussed  in  an  earlier  report  [9].  In  the  present  report  the 
investigation  is  extended  to  the  continuous  model. 

The  associative  memory  considered  (see  Fig.  1)  consists  of  a  bidirectional  front  stage 
which  is  capable  of  a  forward  stroke  and  a  backstroke,  both  of  which  are  linear 
transformations.  We  call  this  frontstage  a  "Bidirectional  Linear  Transformer"  (BLT).  In  its 
forward  stroke,  the  BLT  transforms  the  N  dimensional  bipolar  input  vector  x  into  a  vector 
u  with  integer  components  in  the  range  [-N,N].  The  BLT  is  arranged  such  that  its  rear 
output  u  is  a  linear  combination  of  orthonormal  labels  of  the  stored  states.  The  vector  u  is 
presented  to  the  rear  stage,  called  "Dominant  Label  Selector"  (DLS).  This  device  is  to 
select  the  dominant  label  from  the  linear  combination  u.  The  labels  are  here  chosen  as 
Hadamard  vectors.  The  settled  DLS  output  y  is  to  be  the  Hadamard  vector  closest  to  u. 

The  vector  y  is  returned  to  the  BLT  and  processed  in  a  backstroke,  which  produces  from 
the  label  y  the  stored  state  to  which  it  belongs.  If  everything  works  as  expected,  that  stored 
state  is  the  one  closest  to  the  input  x.  The  whole  device,  BLT  plus  DLS,  is  called  "Selective 
Reflexive  Memory"  (SRM),  where  "selective"  indicates  the  selection  of  the  dominant 
Hadamard  vector  by  the  DLS,  and  "reflexive"  alludes  to  the  bidirectional  nature  of  the 
BLT.  The  BLT  may  be  seen  as  a  BAM  [7]  without  rear  thresholding.  Front  thresholding  is 
optional.  The  SRM  may  be  likened  to  an  ART,  in  which  the  winner— take— all  circuit  in  the 
top  layer  is  replaced  by  a  DLS.  Since  the  DLS  is  a  distributed  winner— take— all,  its  use  is 
expected  to  overcome  the  fault  sensitivity  of  the  ART  top  layer. 

Conventions  and  notations  are  the  much  the  same  as  in  [9].  The  input  and  output  of 
a  neuron  threshold  function  are  respectively  called  "activation"  and  "signal"  of  the  neuron. 
The  summation  convention  of  tensor  calculus  has  been  used  where  convenient.  In  order  to 
distinguish  from  unsummed  expressions,  we  have  used  the  convention  in  its  strict  form 
[10]:  in  a  product,  summation  over  a  repeated  index  is  implied  only  if  the  index  appears 

twice,  once  as  a  subscript,  and  once  as  a  superscript.  For  instance,  uav  is  summed,  but 

u  v  is  not. 
a  a 

Indices  are  used  as  follows,  i,  j,  and  k  denote  components  in  input  space,  a,  b,  c,  d, 
and  p  denote  components  in  the  space  between  the  BLT  and  DLS,  and  also  components  of 
the  DLS  state  vectors,  a,  0,  and  7  are  used  to  name  stored  vectors  and  their  labels,  the 
Hadamard  vectors.  All  indices  range  from  1  to  N.  A  statement  involving  unspecified  "life" 
indices  [10]  is  meant  to  be  true  for  all  values  1  to  N  for  such  indices. 

The  Kronecker  delta  is  written  as  6  with  two  indices.  If  the  indices  have  the  same 
value,  the  symbol  stands  for  unity,  else  it  stands  for  zero. 

Indices  are  raised  and  lowered  with  the  Kronecker  delta  as  metric  tensor.  Hence,  va 
and  v  have  the  same  numerical  value. 

a 


As  a  further  simplification  of  appearance,  1  is  often  written  as  +  and  -1  as  -,  when 
no  confusion  with  composition  symbols  can  arise. 

Customary  mathematical  shorthand  is  used  where  converient:  e  means  "is  an 
element  of,  V  means  "for  all",  3  means  "there  exists",  ==>  means  "implies",  and  <== 
means  "is  implied  by". 

In  the  continuous  model  of  the  DLS,  the  signals  y  lie  either  in  the  closed  solid 
hypercube  JN  =  [-1,1] N  ,  or  in  the  open  solid  hypercube  =  (-1,1)N  ,  depending  on 
whether  or  not  the  sigmoidal  neuron  output  function  s(v)  attains  the  values  ±1.  In  the 
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former  case,  the  signals  y  can  settle  at  one  of  the  comer  points  of  the  hypercube,  defined  as 
the  set  =  {—1,1}^.  The  1^  is  the  set  of  signals  considered  in  the  discrete  model. 


SELECTIVE  REFLEXIVE  MEMORY 

The  Selective  Reflexive  Memory  (SRM)  consists  of  two  stages.  The  front  stage  is  a 
Bidirectional  Linear  Transformer  (BLT)  which  in  its  forward  stroke  performs  on  the 
bipolar  input  vector  x  the  linear  transformation 

ub=Bb'  xi  ■  (1) 

The  connection  matrix  of  the  BLT  is  chosen  as 

Bb^bV  '  <2> 

where  q^,  a—  1  to  N,  are  the  stored  bipolar  vectors,  and  the  hft  are  Hadamard  vectors.  We 

have  chosen  the  same  dimension  N  for  the  BLT  front  and  rear  vector  spaces,  and  have 
taken  the  number  of  stored  states  equal  to  N  as  well.  It  will  be  clear  from  the  theory  how 
to  modify  these  choices  if  desired.  The  structure  (2)  of  the  BLT  connection  matrix  is 
Hebbian,  i.e.,  it  can  be  built  up  adaptively  by  Hebb  learning. 

The  Hadamard  vectors  hQ  are  rows  of  a  Hadamard  matrix,  i.e.,  an  orthogonal 

matrix  (up  to  a  scalar  factor)  with  entries  +  and  — .  Properties  of  Hadamard  vectors  used 
in  this  report  are  shown  in  Appendix  A.  The  Hadamard  vector  hft  serves  as  a  label  for  the 

stored  state  q  . 

Ct 

With  the  connection  matrix  (2),  Eq.  (1)  gives  for  the  rear  output  of  the  BLT 

ub=hbQqa  xi  =  cah\  , 

where  c  =  x.q_  (4) 

a  a  v  ' 

is  the  scalar  product  of  the  vectors  x  and  qft.  If  q^  is  the  stored  vector  closest  to  the  input 
x,  then  Cp  is  the  largest  among  the  coefficients  cQ.  Suppose  that  behind  the  BLT  there  is  a 

stage  which  selects,  from  the  linear  combination  cfth  the  dominant  Hadamard  vector 

h^.  Such  a  device  is  here  called  a  " Dominant  Label  Selector  (DLS).  We  postpone  discussion 
of  the  DLS,  and  consider  the  processing  of  the  DLS  output  y,  for  now  assumed  to  be  the 

dominant  Hadamard  vector  ifl.  As  depicted  schematically  in  Fig.  1,  the  DLS  output  y  is 
returned  to  the  BLT,  to  be  used  in  a  backstroke 

wl=ybeb1  •  (5) 

With  y  =  h^  and  and  the  connection  matrix  (2),  (5)  gives  as  result  of  the  BLT  backstroke 

w‘=hAV =N  s(iX =N  V  ’  («) 

where  use  has  been  made  of  the  orthonormality  of  the  Hadamard  vectors,  expressed  by 

(A2)  in  Appendix  A.  If  w  is  thresholded  with  the  signum  function  s  we  get 

o 

X'1  =SQ  (w1)  =  q^1  ,  (7) 

which  is  the  stored  vector  that  is  closest  to  the  input  x.  Hence,  if  the  DLS  would  work  as 
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required,  the  SRM  would  perform  perfect  associative  recall  of  anyone  of  N  stored  vectors. 
There  would  not  be  any  spurious  stable  states. 

There  is  the  option  of  deleting  the  thresholding  in  front  of  the  BLT.  Then,  no  BLT 
neurons  are  needed;  the  BLT  is  just  a  "bidirectional  connection  box".  Also,  there  is  the 
option  of  using  an  analog  input  for  x.  Finally,  there  is  the  option  of  using  the  result  of  the 
BLT  backstroke  to  upgrade  the  SRM  input,  or  just  as  the  output  of  the  SRM. 

/y 

The  DLS  has  the  task  to  select,  from  the  linear  combination  u^  =  ca  h  the 

Hadamard  vector  h^  which  occurs  with  the  largest  coefficient,  i.e.,  c^  is  largest  among  the 
cft,  a=l  to  N.  But  this  means  that  h^  is  the  Hadamard  vector  with  the  largest  scalar 
product  u.ha  .  Therefore,  the  DLS  itself  may  be  considered  an  associative  memory  with 
stored  states  h^  4  a—\  to  In  those  terms  the  DLS  is  to  produce,  from  the  input  u,  the 
closests  stored  state,  h^. 


DOMINANT  LABEL  SELECTOR  (DLS) 


The  DLS,  considered  as  an  associative  memory  with  stored  states  taken  as  the 
Hadamard  vectors  hQ  ,  must  find  the  Hadamard  vector  nearest  to  the  BLT  output  u.  A 

Hopfield  memory  [1]  cannot  be  used  here,  because  it  would  saturate  long  before  all  the  N 
states  are  stored.  Furthermore  there  is  a  problem  due  to  orthonormality  of  the  stored  states 
[91.  Instead,  we  have  chosen  for  the  DLS  an  associative  memory  with  quadratic  activation. 
This  memory,  called  "Quadratic  Hadamard  Memory",  has  been  investigated  in  [9]  by 
means  of  the  asynchronous  discrete  model,  in  which  the  activation  is  given  by 

va  =  SabAC  +  ra  , 

and  the  signal  y  is  determined  by  thresholding  v  with  the  signum  function.  The  last  term 

a  a> 

r&  may  be  either  seen  as  an  external  coupling,  or  as  defining  thresholds.  The  connection 


tensor  S^  is  restricted  to  be  fully  symmetric. 

The  quadratic  activation  expressed  by  (8)  constitutes  a  case  of  "higher— order 
neurons"  [11],  Properties  of  quadratic  activations  have  recently  been  discussed  by  Volper 
and  Hampson  [12]. 


Stability  of  the  DLS  in  the  asynchronous  discrete  model  is  assured  [11]  if  all 
connection  tensor  components  with  at  least  two  equal  indices  are  zero, 


Sapp  =  0  ,  for  all  a  and  p.  (9) 

It  has  been  shown  in  [9]  that  the  Hadamard  states  h^,  a=l  to  N,  are  stable  in  the 
asynchronous  discrete  model  if  the  connection  tensor  is  chosen  as 


Sabc  =  l  haah0bhac-N  Wc  1~N  {bArN  <ca<bl+2N  Wcl  <10> 


and  if,  moreover,  we  take 


r  =  r  for  all  indices  a. 
a 


(11) 
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and  0  <  r  <  N2-2N  .  (12) 

The  form  of  the  connection  tensor  (10)  is  similar  to  the  Hopfield  matrix  [1],  making 
allowance  for  the  quadratic  nature  of  the  internal  coupling.  The  last  four  terms  in  (10) 
have  been  added  to  satisfy  condition  (9),  while  retaining  full  symmetry.  To  check  that  the 
subtraction  works  one  needs  the  property  (A4)  of  the  Hadamard  vectors  used. 

It  further  has  been  shown  in  [9]  that  in  the  asynchronous  discrete  model  no  spurious 
stable  states  exist  if  one  takes  (10)  and  (11),  and  if  (12)  is  sharpened  to 

N2-dN  <  r  <  N2— 2N  .  (13) 

In  addition  to  the  system  with  the  subtracted  connection  matrix  (10),  we  considered 
in  [9]  an  alternate  system,  in  which  the  connection  matrix  is  simply  taken  as 

Sabc=  8  haahabhac  ’  (14) 

without  any  subtractions.  In  the  asynchronous  discrete  model  this  DLS  has  been  found  [9] 
to  also  have  the  properties  mentioned  above.  Although  this  system  has  a  simpler  expression 
for  the  connection  matrix,  it  has  a  somewhat  larger  number  of  physical  connections, 
because  (9)  does  not  hold. 


CONTINUOUS  MODEL  FOR  DLS 


The  main  problem  remaining  after  the  discrete  model  investigation  [9]  was  the 
question  of  coupling  of  the  BLT  output  u  to  the  DLS,  taken  as  a  quadratic  Hadamard 

cL 

memory.  The  discrete  model  cannot  properly  account  for  such  coupling,  and  this  is  the 
main  reason  for  extending  the  investigation  to  the  continuous  model.  The  DLS  dynamics  is 
then  described  by  equations  of  motion  for  the  activation  v  of  neuron  a, 

Ok 


v„  = 


=  -va  +  S 


b  c 

abc*  y 


+  r„ 


(15) 


The  dot  denotes  a  time  derivative,  and  the  output  signal  y  of  neuron  a  is  given  in  terms  of 

Cl 

the  activation  v&  by 

ya  =  s(va)  .  (16) 

where  s(v)  is  a  soft  sigmoid  function  which  either  attains  the  values  +1,  or  approaches 
these  values  asymptotically.  The  function  is  chosen  antisymmetric  and  such  that 

s'(v)  >  0  for  all  v  ,  (17) 

where  the  prime  denotes  the  derivative. 

We  need  to  discuss  the  coefficients  of  terms  in  (15).  In  electronic  implementations, 
the  first  two  terms  represent  the  lumped  effects  of  amplifier  input  capacitance  and 

resistance.  This  may  be  expressed  more  clearly  by  writing  the  terms  as  Cv  and  — ■ v  JR. 

a  o' 

But  C  and  R  can  be  brought  to  unity  by  scaling  of  the  time  and  the  activation,  together 
with  a  related  adjustment  of  the  sigmoid  function  s.  Hence,  putting  C  and  R  to  unity  does 
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not  constitute  a  physical  restriction.  We  will  proceed  with  the  DLS  dynamics  in  the 
normalized  form  (15). 


The  connection  tensor  Sa^c  in  the  equations  of  motion  needs  to  be  specified.  Two 

choices  will  be  considered:  the  subtracted  tensor  (10)  and  the  unsubtracted  tensor  (14).  In 
the  asynchronous  discrete  model  these  two  connection  tensors  give  about  the  same  results 
[9].  In  the  sequel  we  will  sometimes  choose  the  subtracted  form,  and  sometimes  the 
subtracted  tensor,  as  determined  by  opportunities  for  theory  development  or  clarification. 
There  also  are  results  which  hold  for  any  symmetric  connection  tensor,  and  they  will  of 
course  be  derived  without  having  the  tensor  specified. 


Because  in  the  continuous  model  the  set  of  signal  states  is  the  closed  or  open  solid 

hypercube  =  [—1,1]^  or  J^=  (—1,1)^,  there  are  many  more  possibilities  for  stationary 

states  than  in  the  discrete  model,  where  the  signals  are  constraint  to  lie  at  the  corner 
points  of  the  solid  hypercube.  Hence,  the  investigation  of  stationary  and  stable  points  of 
the  continuous  DLS  involves  a  lot  more  territory.  In  addition,  there  is  the  question  of 
where  the  DLS  state  will  eventually  settle,  if  started  out  at  a  suitable  initial  state.  The 
nonlinearities  in  the  DLS  dynamics  make  it  difficult  to  integrate  the  equations  of  motion. 
Our  challenge  is  to  get  the  required  information  without  having  to  do  the  integration. 


NEURON  OUTPUT  FUNCTIONS 

Two  convenient  choices  have  been  made  for  the  neuron  output  function  s(v).  The 
first  is  the  hyperbolic  tangent, 


s(v)  =  tanh(gv)  ,  (18) 

where  g  is  the  gain  at  zero.  Since  the  asymptotic  values  ±1  are  not  attained,  the  set  of 

•  '  M 

signals  is  here  the  open  solid  hypercube  =  (—1,1)  . 


The  other  output  function  used  in  this  report  is  the  piecewise  linear  function 

s(v)  =  -1  if  v<  — 1/g 

=  gv  if  |v|<l/g  ,  (19) 

=  1  if  v>  l/g  . 


This  function  attains  the  values  ±1,  so  that  the  set  of  signals  is  the  closed  solid  hypercube 
=  [—1,1]^.  In  analytical  work  one  has  to  watch  the  discontinuity  in  the  derivative  at 


v=±l/g-  Moreover,  if  any  of  the  acitivations  v&  exceeds  l/g  in  magnitude,  then  the  state 

cannot  be  described  unambiguously  by  the  signal  y,  and  one  must  use  the  activation  v. 
Either  sigmoid  function  has  its  advantages  and  disadvantages,  and  we  will  use  one  or  the 
other,  as  convenient.  Although  one  must  be  careful  not  to  claim  more  than  is  proved,  we 
expect  the  results  derived  to  remain  valid  for  other  similar  choices  of  output  function.  In 
the  numerical  computations  performed,  no  difference  was  noticed  when  one  function  was 
used  or  the  other,  as  long  as  the  gains  g  were  taken  the  same.  Properties  and  consequences 
of  the  two  output  functions  are  shown  in  Appendix  C. 
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THE  ENERGY 


For  a  fully  symmetric  connection  tensor  Sabc>  the  activation  velocity  vector  v  is  curl 

free  in  the  space  of  signal  vectors  y  (but  not  in  the  space  of  activations;  see  Appendix  B). 
Hence,  the  integral 


(20) 


does  not  depend  on  which  path  is  taken  from  the  origin  to  the  point  y.  E  is  a  Liapunov 
function  since 

£  =  -£(va)2s'(va)  (21) 

is  nonpositive,  by  (17).  From  (20)  one  finds 


where 

and 


EW  =  -  s  -  Va+  • 

n  =  ?  y<va) 

¥<V)  =  {  £  s'(0  d£. 


(22). 

(23) 

(24) 


Although  p  is  defined  in  (24)  as  a  function  of  v,  the  term  fi  in  (22)  is  taken  as  a  function  of 
ya,  by  application  of  the  inverse  of  the  mapping  y=s(v).  The  inverse  mapping  is  unique  if 

condition  (17)  is  replaced  by 

s'(v)  >  0  ,  for  all  v  .  (25) 


Since  the  energy  E  of  (22)  is  bounded,  and  £  is  nonpositive,  we  have 
Theorem  1:  The  continuous  DLS  subject  to  the  conditions  posed  is  stable. 


SPECIFIC  FORMS  FOR  EQUATIONS  OF  MOTION  AND  ENERGY 
For  the  subtracted  connection  tensor(lO)  we  have  from  (15)  the  equations  of  motion 

\  =  -va+  N  §  haa>'a2-2NVrN),bJ'b{al+2NJrl2{al+  ra  '  <26> 

where 

y<r  UAMn*/  (27) 

is  the  Hadamard  transform  of  y  .  The  factor  1/^N  is  applied  in  order  to  preserve  norms.  In 
(26),  y,  is  the  component  y  for  a=l,  not  the  component  y  for  a=l. 

•I-  a.  Ot 

It  is  sometimes  convenient  to  see  separately  the  components  for  a=l  and  for  a^l: 
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Vi=  -v,+  r,  , 


and 


a^>  ^a=  "va  +  N  S  haaV  "2NVl  +  ra  = 

use  has  been  made  of  (A6)  and  (A9).  The  energy  is 

E  =  -  s  w3  §  +2Nyi3)  -A,  +f!  ■ 

where  ft  is  given  by  (23)  and  (24).  If  the  neuron  output  function  is  a  hyperbolic  tangent  as 
given  by  (18),  then  one  has  for  ft 


(28) 

(29) 

(30) 


0  =  jgS  «1+ya)  1"(1+ya>+(1~-va)  ,n(l-ya)}  ’ 


(31) 


as  derived  in  Appendix  C.  If  the  output  function  is  taken  as  the  piecewise  linear  function 
(19),  then 


where  R  is  the  Euclidean  norm  of  y, 


=  R7(2g)  ,  for  |  y a  |  <  1 


r>2  a 

R  =  y  ya  • 


For  the  unsubtracted  connection  tensor  (14)  the  equations  of  motion  are 
\  =  -va+  N  8  >Wa2  +  ra  ' 

which  split  into 

~vi+  N  A  +  ri  ■ 

and 

a^l,  v=— v  +NSh  y  ^  4- r  . 

a  a  a  a&J  a  a 

where,  again,  (A6)  and  (A9)  have  been  used.  The  energy  is  then 

E=->Va3-rVn- 


(32) 

(33) 

(34) 

(35) 

(36) 

(37) 


For  subtracted  dynamics,  the  a=l  equation  of  motion  (28)  is  uncoupled  from  the 
rest.  The  solution  is  simply 

Vj  =  rj  +  ce-t  ,  (38) 

where  c  is  a  constant.  Since  all  Hadamard  vectors  used  here  have  +  as  first  component,  and 
we  want  a  Hadamard  vector  as  DLS  output,  things  must  be  arranged  such  that  y^ 

approaches  unity  for  large  times,  which  means  that  the  activation  v^  must  also  be  positive. 

It  follows  that  we  must  require 

t1  >  0  .  (39) 

For  unsubtracted  dynamics,  the  a=l  equation  of  motion  is  given  by  (35).  Now  there 
is  coupling  to  the  scalar  R  of  (33).  The  condition  (39)  then  assures  that  v^  does  not 
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temporarily  turn  negative,  which  would  slow  the  settling  of  the  DLS. 

For  subtracted  dynamics  there  is  the  option  of  omitting  neuron  #1  altogether,  and 
clamping  the  y1  signal  line  to  =  +  permanently.  There  is  no  need  for  connections  from 

the  y^  line  to  the  other  neurons.  We  will  refer  to  this  arrangement  as  the  y ^  clamping 

arrangement.  This  amounts  to  putting 

yx  =  +  ,  (40) 

which  gives  the  -2Nyayx  term  in  (29)  the  value  -2Nya.  The  first  component,  up  of  the 
BLT  output  u  is  ignored.  That  this  can  be  done  without  penalty  is  related  to  the  act  that 
the  Hadamard  vectors  used  all  have  first  component  +.  The  y^  clamping  speeds  up  the 
DLS  action,  as  can  be  seen  by  monitoring  numerical  computations. 

For  unsubtracted  dynamics,  the  coupling  term  N  yaya  in  (35)  is  always  positive, 

2 

and  it  tends  to  N  or  a  nearby  value  for  large  times.  The  value  of  rn  needs  to  be  chosen 
such  that  y^  tends  to  +1  for  large  times.  The  y^  clamping  scheme  may  be  used  also  for  the 
DLS  with  unsubtracted  dynamics. 


COUPLING  SCHEMES 

The  BLT  output  u  must  be  coupled  to  the  DLS.  We  see  two  ways  of  doing  this.  In 
the  external  coupling  scheme  the  vector  r  in  the  equations  of  motion  is  chosen  as 

ra=  ^ua+cN,5al)  ’  (41) 

where  p>0  is  a  coupling  constant,  and  N  is  the  dimension.  For  a=l  the  Kronecker  <5^  is 

unity,  else  zero.  The  term  cN<5a^  applies  a  threshold  -/zcN  solely  to  the  first  neuron.  This 

threshold  has  been  written  into  r  for  later  convenience,  and  the  constant  c  will  be  chosen  in 
due  time.  The  external  coupling  scheme  also  requires  a  reset  of  the  DLS  activation  to  zero, 
everytime  a  new  BLT  output  u  is  applied.  In  the  theory,  such  time  is  chosen  as  t=0,  and 
the  reset  then  fixes  an  initial  value  for  the  activation  vector  v(t): 

va(0)=0  ,  V  a  .  (42) 

For  small  times  the  quadratic  term  in  ihe  equations  of  motion  (15)  is  negligible 
compared  to  r  ,  so  that  we  have 

a- 

t<<x  ,  'ra=_va+  ra  >  (43) 

with  r  given  by  (41).  With  the  initial  value  (42)  we  have  the  integral 

a> 

t«l,  va=  //(ua+cN6al)(l-e-t)  ,  (43) 

which  shows  that  the  activation  exponentially  approaches  /z(u  +cN 6  .)  .  The  e  folding 
time  is  unity  here,  because  the  RC  time  is  unity,  by  the  scaling  that  has  thrown  the 
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equations  of  motion  into  the  normalized  form  (15). 

The  second  choice  is  the  initial  value  coupling,  for  which  the  vector  r  is  chosen  as 


ra=cSal 


(44) 


with  c  some  constant  to  be  determined,  and  the  BLT  output  is  applied  as  initial  activation, 
after  multiplication  by  the  coupling  constant  p, 


va(°)=^ua  • 


(45) 


For  small  times  we  now  have  the  integral 


t«l  ,  va=cN£al(l-e  l)  +pu&e  1  .  (46) 

The  activation  components  now  approach  the  value  cN<5al,  and  the  contribution  of  the 
applied  BLT  output  u  in  the  activation  dies  out. 

Comparing  the  two  coupling  schemes,  the  external  coupling  appears  to  have  the 
practical  advantage  that  the  DLS  input  and  output  are  separate.  The  BLT  output  vector  u 
remains  standing  on  the  DLS  external  input,  while  on  the  output  the  DLS  state  y  appears 
as  it  is  developing  in  time.  The  output  y  is  processed  by  the  BLT  in  its  backstroke,  with 
the  result  x'  appearing  at  the  front  of  the  BLT.  The  separation  of  the  DLS  input  and 
output  is  particularly  convenient  for  the  setup  in  which  the  BLT  front  output  x'  is  not 
used  to  upgrade  the  input  x  ,  but  is  considered  as  the  output  of  the  whole  machine. 


DYNAMIC  REGIONS 

It  is  helpful  to  distinguish  regions  in  signal  space  which  have  essentially  different 
dynamics.  These  regions  do  not  have  sharp  boundaries,  but  blend  smoothly  into  each  other. 
In  discussing  these  regions,  we  prefer  to  use  simple  albeit  imprecise  language  rather  than 
cumbersome  precision. 

In  the  proportional  region  of  signal  space  the  s'(v  )  is  constant,  either  precisely  or 

a 

approximately.  This  region  includes  the  origin,  and  it  may  have  considerable  extent, 
depending  on  the  neuron  output  function  s(v)  used.  For  the  piecewise  linear  function  (19) 
the  proportional  region  in  signal  space  is  given  by  — l<y  <1  . 

cl 

In  the  proportional  region  y  =  gv  ,  where  g  is  the  constant  gain  in  the  region,  and 

a  a 

the  equations  of  motion  (15)  may  be  written 

~h  +  esabAc  +  *V  <47> 

For  unsubtracted  dynamics  the  equations  of  motion  (47)  can  be  decoupled  by  means  of  a 
Hadamard  transform;  with  (14)  one  finds 

i>'a  =  -lJ'Q+'/N35'a2  +  ra'  <48) 

where  y  is  the  Hadamard  transform  (27)  of  y„  and  r  is  the  Hadamard  transform  of  r  . 
a  v  ’  J  a  a  a 
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For  subtracted  dynamics  one  finds  with  (10) 

|  =  -  i  ya  +  ^  ya2  -2N  Vi  y  V«i+2^  ?i2hai+  ra  -  <49) 

where  (A6)  has  been  used.  In  (49)  and  throughout  this  report,  y^  stands  for  ya  with  a=l.  If 
y j  clamping  is  used,  y^=l  is  to  be  substituted  in  (49).  Even  then,  the  equations  of  motion 
are  not  entirely  uncoupled,  because  of  the  term  with 

y%  =  R2  ■  (so) 

Note  that  the  norms  given  by  (50)  and  (33)  are  equal,  because  of  the  orthogonality  of  the 
Hadamard  Transform. 

The  neuron  output  function  s(v)  is  restricted  to  be  of  the  sigmoid  type.  For 
estimation  purposes  it  is  convenient  to  introduce  a  value  v*,  which  we  call  the  critical 
activation,  such  that 

for  |  v  |  >  v*  ,  |  s(v)  |  >  1-  e  =  y*  ,  (51) 

where  e  is  a  positive  number  much  smaller  than  unity,  such  as  t  =  0.01  .  Condition  (17)  or 
(25)  and  the  antisymmetry  of  s(v)  imply 

for  | v |  >  v*  ,  s'(v)  <  s'(v*)  .  (52) 

We  call  an  activation  va  subcritical  if  )  va)  <  v*,  and  supercritical  if  |  v  |  >  v*.  For 
a  supercritical  activation  v  ,  the  resulting  signal  yQ  may  be  taken  as  +1,  determined  by  the 

a  a. 

sign  of  va,  as  a  suitable  approximation  in  certain  mathematical  expressions.  If  (25)  is  true, 
sub—  and  supercriticallity  also  can  be  stated  in  signal  space:  a  signal  ya  is  subcritical  if 
|yj  <  y*,  and  supercritical  if  |y  |  >  y*.  Hence,  in  discussing  sub—  and  supercriticallity, 

d  d 

we  then  need  not  say  whether  the  state  is  considered  in  activation  space  or  signal  space. 

Since  the  state  vector  v  has  N  components,  some  components  may  be  supercritical  , 
while  others  may  be  subcritical.  Hence  we  distinguish  states  that  are  entirely  subcritical, 
partially  supercritical ,  and  entirely  supercritical.  A  supercritical  activation  not  only 
produces  a  signal  that  may  be  approximated  as  +1,  but  also  gives  a  derivative  s'  smaller 

than  s'(v*),  by  (52).  Which  if  any  of  these  two  properties  of  supercriticallity  is  used  is  a 
matter  of  convenience. 

When  the  state  is  entirely  supercritical,  the  point  y  lies  at  or  close  to  a  corner  point 
of  the  solid  hypercube.  The  point  set  {y|y*<  |y  |  <  1},  with  a  suitable  y*  is  called  a  comer. 

If  a  Hadamard  point  is  included,  we  call  the  set  a  Hadamard  comer. 


STATIONARY  POINTS  OF  THE  IN 


It  is  a  simple  matter  to  determine  whether  a  given  point  y  is  stationary.  All  one 
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needs  to  do  is  calculate  v  from  the  equations  of  motion  (26)  or  (34),  and  see  whether  the 

activation  velocity  v  is  zero.  Alternatively,  one  can  see  whether  at  y  the  energy  E  has  a 
zero  gradient, 

daE  =  0  ,  (53) 

where  d  stands  for  d/dya. 

As  will  be  shown,  thresholds  can  be  chosen  such  that  the  Hadamard  points  are 
stationary.  The  question  is  whether  there  are  other  stationary  points.  To  investigate  this  in 
the  continuous  model  we  use  a  method  suggested  by  the  study  [9]  of  the  asynchronous 
discrete  model.  We  begin  by  considering  the  vector 

Q  =  N  E  h  y  2=  S  h  h  ,  h  ybyc  ,  (54) 

^a  a.  aaJ  a  a  aa  ab  acJ  J  v  1 

in  the  equation  of  motion  (29)  for  the  case  with  the  subtracted  connection  tensor  (10); 

a*1’  =  -  va  +  Qa  '“Vl  +  ra  •  (65) 

We  choose  the  piecewise  linear  output  function  given  by  (19  ue  gain  must  be  chosen 
large  enough  so  that  some  stationary  points  belong  to  1^.  Tne  vector  Qa  can  be  rewritten 


with  two  Hadamard  factors  by  using  the  group  property  of  Hadamard  vectors,  (see 
Appendix  A) 


baabab  bad  ’ 

(56) 

where 

this  allows  rewriting  (54)  as 

d=f(a,b)  ; 

(57) 

V 

§  S,c  badbac^b^c' 

(58) 

Using  the  group  property  once  more, 

h  ,h  =h  , 
ad  ac  ae 

(59) 

where 

throws  (58)  in  the  form 

e=f(d,c)  ; 

(60) 

§  8.c  hae  ^  ' 

(61) 

With  (A4),  (61)  may  be  written 

(62) 

V  N  S,c  {el  rV 

In  order  to  contribute  to  the  sum,  the  term  <5elybyc  must  have  e=l;  with  (60)  and  (A17) 
this  means  that  d=c  .  With  (57)  and  (A22)  this  implies  that 

a=f(b,c)  . 


Hence,  (62)  may  be  written 


(63) 
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Q  =  N  S  ybyC  ,  (64) 

a  (b,c)€Ka 

where  Na  is  the  set  of  index  pairs 

#a={(b,c)|  f(b,c)=a}  .  (65) 

We  restrict  to  signals  that  lie  on  cornerpoints  of  the  solid  hypercube,  i.e., 

As  further  preparation,  we  state  here  as  Lemma  1  the  Theorem  2  of  [9];  for 
convenience,  the  proof  [9]  is  repeated  in  Appendix  D. 

o 

Lemma  1:  yel^,  Q^=— N  ,  V  b  such  that  y^—  <==>  y  is  Hadamard  ^  h^. 

We  are  now  prepared  to  show 

2 

Lemma  2:  y  el^,  not  Hadamard  ==>  3  index  b  such  that  y^=  —  and  >  — N  +4N  . 

Proof:  Let  y  6  1^  and  not  Hadamard.  This  implies  that  y  is  not  h, ,  so  that  the  set 

A  =  {b  |  y^=  —  }  is  not  empty.  Choose  an  index  b  such  that  y^=—  and  —  N  ;  this  is 
always  possible,  since  otherwise  Lemma  1  would  imply  that  y  is  Hadamard  ^  ,  which  is 

false.  We  use  the  expression  (64)  for  Q  .  There  are  N  terms  in  the  sum,  since  for  every 

d 

b=l  to  N  the  remaining  index  c  is  determined  by  the  condition  that  the  pair  (b,c)  lies  in 
the  set  #a.  Since  y  €lN,  all  terms  y^yc  are  either  1  or  -1.  The  terms  cannot  all  be  -,  since 

Qa^— N2.  Hence,  the  sum  contains  at  least  one  +  term.  Say,  that  term  is  ypy^.  But  then, 

the  term  y^yp  is  +  as  well.  It  follows  that  the  sum  (64)  contains  at  least  two  +  terms. 
Since  flipping  the  sign  of  a  single  term  from  -  to  +  causes  the  sum  to  change  by  2,  we  have 

Q  >  -N2+4N.  j 

a 

Lemma  2  has  an  important  application  to  the  equations  of  motion  (55),  with  the 
threshold  term  chosen  as 

*n,  ra=  (N2^N)hla4-^ua ,  (66) 

where  h1  is  the  Hadamard  vector  with  all  components  +,  u„  is  the  output  of  the  BLT,  and 

A  a 

>  0  is  a  coupling  constant.  With  (66),  the  BLT  is  coupled  to  the  DLS  by  applying  the 
BLT  output  as  an  external  coupling  to  the  DLS.  It  is  convenient  to  put  a  bound  on  the 
coupling  constant, 

a^l,  |/mal<2N— 1/g  .  (67) 

6  >  5N  ' 


and  restrict  the  gain  g  by 


(68) 
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Moreover  we  take  from  here  on 
We  have 


N>4. 


(69) 


Theorem  2:  For  a  continuous  DLS  with  subtracted  connection  tensor  (10),  a  piecewise 
linear  output  function,  thresholds  subject  to  (66)  and  (67),  and  a  gain  subject  to  (68),  the 
only  stationary  points  of  1^  are  the  Hadamard  points. 


Proof:  First  we  show  that  the  Hadamard  points  are  stationary.  For  a  Hadamard  point 

y  =  h  one  has 
j  7 

Qa=N2h7a ,  m 

and  the  equation  of  motion  (55)  with  thresholds  given  by  (66)  reads 

a*l,  ia=  -v3  +  N2^  -2Nh^  +N2— 4N  +M»a  .  (71) 

where  y1=+  has  been  substituted. 

2 

For  indices  a  such  that  h  =  —  the  N  terms  cancel,  and  (71)  gives 

7a 

Vva=  _2N  +  ^ua  <  _1/6  >  (72) 

while  for  the  remaining  indices  a#l  we  have 

'a+va=  2n2-6N  +/zua  >  !/g  i  (73) 

conditions  (67),  (68),  and  (69)  have  been  used.  (72)  shows  3  v  <  — 1/g  such  that  v  =  0; 

the  inequality  v  <  -1/g  is  consistent  with  y  =  -1.  (73)  shows  3  v  >l/g  such  that  v  =0; 

Qt  d  da 

va>l/g  is  consistent  with  ya=  1.  The  a=l  equation  of  motion  (28), 

*1  =  ~vi  +  ri  (74) 

shows  no  coupling  with  other  neurons,  and  can  either  be  implemented  as  is,  with  Tj>0  ,  or 
may  be  cast  aside  in  favor  of  y^  clamping,  as  discussed  before. 

It  follows  that  the  Hadamard  point  y  =  h^  is  stationary.  To  show  that  there  are  no 
other  stationary  points,  consider  a  point  y  of  the  IN  that  is  not  Hadamard.  By  Lemma  2 

there  exists  an  index  b  such  that  &nd  Q^>  — N  +4N.  For  such  an  index,  the  equation 
of  motion  (55)  gives 

vb  +  vb=  Qb+2N  +  N2^N  +h ub  >  2N+/xub>  0.  (75) 

The  point  cannot  be  stationary  because  of  a  conflict  in  signs  of  vb  and  yb  .  J 

Condition  (69)  does  not  constitute  a  restriction  in  practical  applications.  The  gain 
conditions  (68)  is  satisfied  in  practice  because  we  want  large  gains  in  order  that  the  DLS 
settles  fast.  With  (69),  condition  (67)  is  satisfied  if 
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because 


/i<l  , 


lu. 


<  N 


as  follows  from  (3),  ca=x*QQ  >  ai*d  the  bipolar  nature  of  the  vectors  x 

(76)  is  convenient  in  practice,  since  it  implies  that  no  amplification  is 
BLT  rear  output  and  the  DLS  input. 


(76) 

(77) 

and  q  .  Condition 
needed  between  the 


STABILITY 

The  stability  of  stationary  points  can  be  investigated  either  with  the  energy  or  with 
the  equations  of  motion. 

Writing  d  for  d/dy  ,  we  have  for  unsubtracted  dynamics,  from  (37) 

a 

(78) 

and 

V»E  =  W* '  (vbH  ^  l  hazKbya  ■  <79) 

Since  away  from  stable  points  £  of  (21)  is  negative  definite,  the  stationary  point  y  is 
asymptotically  stable  iff  the  tensor  at  y  is  positive  definite.  The  point  is  unstable  iff 

the  tensor  has  a  negative  eigenvalue.  It  is  difficult  to  use  these  conditions  because  of  the 
involvement  of  the  Hadamard  matrices  in  the  tensor  (79).  But,  if  we  are  willing  to  give  up 
some  bound  sharpness,  a  very  simple  condition  can  be  stated  in  terms  of  the  tensor  trace;  if 

it  is  negative,  then  there  must  be  a  negative  eigenvalue.  Hence  cPd  E  <  0  implies  that  the 

stationary  point  is  unstable.  With  (79),  (A2),  and  (A4)  it  follows  after  a  short  calculation 
that 

g  (l/s'(va))  -2N2yj  <  0  =  =  >  y  is  unstable.  (80) 


This  sufficient  condition  for  instability  only  involves  y^  and  the  sum  of  the 

reciprocal  sigmoid  derivatives  s'  for  the  neurons.  Since  s'  is  nonnegative  by  (17)  or  (25), 
cancellations  cannot  occur  in  the  sum  over  a.  Therefore,  satisfaction  of  the  condition 
requires  that  none  of  the  derivatives  s'(va)  be  small.  Roughly,  this  means  that  the  state 

must  be  entirely  subcritical  in  order  that  instability  can  be  concluded  from  (80). 

With  about  the  same  effort  a  much  sharper  sufficient  condition  for  instability  can  be 

derived  from  the  equations  of  motion.  These  may  be  seen  as  expressing  the  flow  velocity  v 
in  activation  space.  The  flow  divergence  is  related  to  instability,  as  will  be  shown 
presently. 


Writing  d a  for  d/dv^,  we  have  from  (34) 


Va  =  _<ba  +  2l/N  8  haa  hQb  s'(vb> 


(81) 


At  a  stationary  point  v  the  velocity  va  vanishes.  The  velocity  at  a  point  Sv  away  from  v  is 
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given  by  fiv  d^v^  The  question  is  whether  the  radial  component  of  this  velocity  is  pointing 

away  from  the  point  v  or  towards  it.  If  the  former  is  the  case  for  some  vector  5v^,  the  point 
v  is  unstable;  if  the  latter  occurs  for  every  vector  <Jv^,  then  the  point  is  stable.  The  sign  of 
the  radial  component  of  the  vector  js  given  by  the  sign  of  the  scalar  product 

5^va.  It  follows  that  v  is  unstable  if  the  tensor  5^^+  5^va  has  a  positive 
eigenvalue.  This  is  the  case  if  the  trace  is  positive.  That  trace  is  the  flow  divergence 


3ava  =  -N  4-  2Nyf  l  s'(va)  • 

(82) 

We  have 

d va  >  0  ==>  v  is  unstable, 

au 

(83) 

and  with  (82)  the  condition  reads 

1  -  2y^  g  s'(va)  <  0  ==>  v  is  unstable. 

(84) 

As  in  (80),  the  condition  involves  only  y^  and  the  derivatives  s'.  However,  (84)  is 

much  sharper  than  (80):  all  that  is  required  for  the  satisfaction  of  the  inequality  is  that  for 
a  single  neuron  the  sigmoid  derivative  s'  is  sizeable.  Roughly,  this  means  that  partially  or 
entirely  subcritical  stationary  states  are  unstable.  A  further  advantage  of  the  condition 

o 

(84)  over  (80)  is  the  absence  of  the  factor  N  ,  which  is  very  large  for  the  large  dimensions 
expected  to  be  important  in  practice. 


We  proceed  with  application  of  (84)  to  the  case  that  the  neuron  output  function  is 
taken  as  the  hyperbolic  tangent  (18).  Then  we  have  from  the  Appendix,  (C3), 


S'(va)  =  g(l-s2(va))  =  g(l-ya2)  . 

(85) 

The  condition  (84)  then  reads 

2gy1  g  (l-ya2)  <  0  ==>  y  is  unstable  . 

(86) 

For  partially  or  entirely  subcritical  signals  y 

3  b  such  that  |  y^  |  <  y*  , 

(87) 

where 

y*=s(v*)  , 

(88) 

and  v*  is  a  suitably  chosen  critical  activation.  The  inequality  in  (86)  is  satisfied  if 
0<t?<1,  2g7?(l-y*2)  >  1,  y^>  t)  ,  and  (87)  is  true; 

o 

Since  for  0<e<l  and  y*  =  l— e  we  have  1— y*  >  e ,  it  follows  that  for  partially  or  entirely 
subcritical  signals 

2gr?e>l,  y1>77,  |yj<y*  ==>  y  is  unstable  . 

The  number  rj  may  be  chosen  freely,  as  long  as  0<7?<1  .  A  convenient  choice  is 


(89) 
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r?  =  1/2  ;  (90) 

then,  the  condition  2gr;e>l  in  (89)  becomes 

£>l/g.  (91) 

Hence,  we  have 

Theorem  3:  For  a  DLS  with  unsubtracted  dynamics  and  a  hyperbolic  tangent  output 
function  with  gain  g,  any  stable  point  must  lie  either  in  the  region  y ^<1/2,  or  in  the 

entirely  supercritical  region  with  criticallity  parameter  £>l/g. 

As  will  be  shown  later,  the  only  stable  point  in  the  region  y^<l/2  is  at  or  near  the 

origin,  and  it  can  be  eliminated  by  proper  choice  of  coupling  constant  p.  By  Theorem  3,  the 
remaining  stable  points  must  lie  in  the  corners  of  the  solid  hypercube  with  positive  y^  Of 

course,  the  Hadamard  corners  are  of  special  interest;  we  want  to  know  whether  they 
contain  a  stable  point.  Theorem  2  states  that  Hadamard  points  are  stationary  if  certain 
conditions  are  satisfied,  which  includes  the  threshold  condition  (66).  But  it  turns  out  that 
the  large  magnitude  of  the  threshold  in  (66)  spoils  the  early  dynamics,  and  therefore  we 
will  need  to  diminish  the  threshold  below  tne  value  given  by  (66).  In  preparation,  we  must 
find  a  range  of  thresholds  that  straddles  the  origin,  and  which  assures  that  every  Hadamard 
corner  contains  a  stable  point.  The  first  step  is  to  find  a  threshold  range  such  that  every 
Hadamard  corner  contains  a  stationary  point.  We  proceed  as  follows. 

For  subtracted  dynamics  we  have  the  equations  of  motion  (29) 

a*l,  +  <92) 


We  wish  to  find  bounds  on  the  threshold  such  that  the  Hadamard  points  are  stationary.  If 
we  use  the  hyperbolic  tangent  output  function  (18),  the  Hadamard  points  are  not 
attainable;  we  then  consider  the  signal 

J'a=(1-ea>V’  (93) 

where 

0<£a<.02,Va.  (94) 

The  value  .02  has  been  chosen  for  convenience  in  a  manner  that  need  not  be  discussed  here. 
For  this  signal  the  equation  of  motion  (92)  gives 


a#l,  v4  =  -va  +N2(l-g2h.)a  — 2N(l-gh7a(l— fj)  +r3  . 

We  investigate  whether  the  signal  can  be  stationary.  Putting  v  =  0  gives 

a*l,  va  =  N(l-«a)(N(l-g-2(l-(l))h7a  +  ra 


ral  <N  /4  , 


then  the  sign  of  va  as  determined  by  (96)  is  the  same  as  the  sign  of  ya  given  by  (93),  since 

N(l-6a)(N(l— €a)— 2(1— fl))— N2/4  >0  (98) 

for  all  ea  subject  to  (94).  It  remains  to  calculate  £  such  that  the  signal  y  given  by  (93)  is 
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entirely  supercritical,  with  criticallity  parameter  c  .  For  r  subject  to  (97),  Eq.  (96)  implies 
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Ml,  |va|>N(l-g(N(l-g-2)-N2/4,  (99) 

provided  that  e  has  been  chosen  such  that 

ea<e,Va.  (100) 

Let  g*  be  determined  such  that 

N(l-e)(N(l— 0— 2)— N2/4=  ^,ln(2/<)  .  (101) 

(99)  can  then  be  written 

.  I  val  >  5g*^(2/c)  >  ^  ln(2/f)  ,  (102) 

for  all  gains  g  such  that 

g>g*  .  (103) 

By  (C6)  we  have 

ln(2/e)  >  v*  ,  (104) 


where  v*  is  the  critical  activation  belonging  to  e,  as  defined  by  (C4).  It  follows  that  (102) 
implies 


a*l,  |va|  >  v*  .  (105) 

For  a=l  we  have  (28)  ,  and  at  the  stationary  point  the  activation  Vj  can  be  made  as  large 
as  desired  by  choosing  r^  large  enough.  Alternatively,  one  can  use  the  y^  clamping  scheme 
and  put  y^=+.  Together  with  (105)  and  (100)  it  follows  that  the  state  (93)  is  entirely 
supercritical. 

The  bound  (97)  is  sloppy,  but  very  generous  for  practical  applications.  We  must  see 
whether  the  gains  g  subject  to  (103)  and  (101)  have  practical  values.  For  N  subject  to  (69) 
it  was  found  that 

.02<f<.001  ,  g*  <  1  .  (106) 

Since  we  want  large  gains  for  fast  DLS  settling,  the  condition  (103)  does  not  constitute  a 
limitation  in  practice. 

It  follows  that  a  DLS  with  subtracted  dynamics  and  hyperbolic  tangent  output 
function  has  a  stationary  point  in  every  Hadamard  corner,  provided  that  (97)  is  satisfied. 

For  unsubtracted  dynamics  the  argument  is  much  the  same.  The  term  -2Ny  yj  is 
then  missing  from  (92),  so  that  instead  of  (96)  we  have 

a*l,  va  =  N2(l-ta)\  +  ra  (107) 

In  this  case,  we  need  only  invoke  (101);  the  same  bound  (97)  then  assures  the  existence  of 
stationary  point  in  every  Hadamard  corner,  for  a  DLS  with  unsubtracted  dynamics. 

It  is  easy  to  see  that  the  conclusions  remain  valid  if  the  hyperbolic  tangent  output 
function  is  replaced  by  the  piecewise  linear  function  given  by  (19),  for  both  dynamics 
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considered.  Of  course,  since  the  outputs  ±1  are  attainable,  the  stationary  point  in  a 
Hadamard  corner  lies  precisely  at  the  Hadamard  point. 

We  have  shown 

Theorem  4:  For  a  DLS  with  either  subtracted  or  unsubtracted  dynamics,  with  the  output 
function  taken  either  as  a  hyperbolic  tangent  or  as  a  piecewise  linear  function,  there  exists 
a  stationary  point  in  every  Hadamard  corner,  if  the  bound  (97)  on  thresholds  is  satisfied, 
and  the  gain  is  at  least  the  g*  determined  from  (101). 

Next,  we  investigate  the  stability  of  these  stationary  points  in  Hadamard  corners. 
From  (79)  we  have 


«2E  =  ^ya2/»'(va)-2/N3S«ya2ya,  (108) 

2  b  a 

where  <5  E  =  6y  6y  is  the  second  variation  of  the  energy  due  to  a  displacement  6y 

(the  first  variation  vanishes  since  y  is  stationary). 

For  the  hyperbolic  tangent  output  function  (18)  one  has  (see  Appendix,  (C3)) 

s'(va)  =  g(i— s2(va))  =  e(1-ya2)  >  (109) 

and  (108)  becomes 

A  =  l  «ya2/(l-ya2)  -  2i/N3  g  Sya2ya  .  (110) 

2 

Let  the  index  b  be  such  that  |  v^  |  is  smallest  among  the  |  v&  |  .  Then  y^  is  smallest  among 
2 

the  y  ,  and  we  have 

i/(!-ya2)  >  i/(i-yb2) ,  »*■  (in) 

Hence,  from  (110)  we  have 

>  «y.«y/(l-yb2)  -2*«3  g  Sya2 ya  ■  (112) 

2 

The  smallest  rhs  occurs  when  ^  6ya  ya  is  maximum,  while  6y  is  constraint  to  have  fixed 
norm.  In  this  regard  we  have 

Lemma  3:  Let  there  be  an  index  0  such  that  y^>ya  ,  V  c40-  For  Ua  subject  to  UaUa=l  , 

2 

the  maximum  of  X=  S  U  y  then  occurs  at  U  =6  o  . 

a  a  J  a  a  ap 

Proof:  The  stationary  points  of  X,  subject  to 

UaU  =1 
a 

are  found  from  the  stationary  points  of  F=X+A(UaUa-l)  ,  where  A  is  a  Lagrangian 
multiplier.  One  has 


(113) 
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0=eF/aja=2UayQ-2AUa. 


(114) 


X  is  found  by  multiplication  by  ,  summing  over  a,  and  using  (113);  the  result  is 


Substitution  in  (114)  gives 
which  implies 


A=X  . 

°=ua(ya-x)  • 

either  U  =0  ,  or  y  =X 
a  ’  •'a 


It  follows  that  X  is  maximum  for 


(115) 

(116) 

(117) 

(118) 


<llf 

where  0  is  the  index  such  that  y p>y a  .  V  at  (3 .  J 

For  a  stationary  point  in  a  Hadamard  corner  of  the  Hadamard  vector  the 

maximum  component  ya  is  y  ^  .  Application  of  Lemma  3  then  gives  for  the  maximum  of 
o  o 

the  term  2^  g  6ya  ya  in  (112),  with  6y  subject  to  a  fixed  norm,  the  value 


Hence,  (112)  implies 


The  rhs  is  positive  if 


2,/N3  by.fy  yp<  2N2  Sj.Sj 


>  Sj.ij  (l/(l-yb2)-2N2)  . 


i/(i-yb2)  >  2N2  . 


(119) 


(120) 


(121) 


For  ease  of  reference  we  state  the  result, 


Lemma  4:  For  a  DLS  with  unsubtracted  dynamics  the  stationary  points  in  Hadamard 
corners  are  stable  if  (121)  is  satisfied. 


We  must  find  a  convenient  inequality  which  implies  (121).  For  the  hyperbolic 


tangent  output  function  y=s(v)  of  (18)  we  have 


l/(l-yz)=cosh2(gv)  =  (e^v+e  ®V)2/4  ; 
eg  I v  I  >  2Nv/2  . 


hence,  (121)  is  satisfied  if 


(122) 

(123) 


where  v  is  the  activation  with  the  smallest  magnitude.  (123)  is  equivalent  to 

|v|  >iln(2NV2).  (124) 

The  equilibrium  equations  of  motion  (96)  may  be  used  to  find  a  condition  on  r  that 

a> 
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implies  (124).  From  (96)  we  have  for  the  minimum  magnitude  v  of  |v  | 

v  >  N(l-f)(N(l-f)-2)  -  |rj  ;  (125) 

with  the  bounds  (94)  and  (97)  the  inequality  (125)  gives 

v>.71N2— .98N  .  (126) 

Hence,  (124)  is  satisfied  if 

,71N2-.98N  >  |  ln(2N/^/2)  .  (127) 

O 

With  (69),  the  inequality  is  true  for  g>l/4.  This  can  be  seen  by  a  direct  calculation  for 
N=4  and  g=l/4,  and  the  fact  that  the  lhs  of  (127)  increases  more  rapidly  with  N  than  the 
rhs.  Hence,  we  have 

Theorem  5:  For  a  DLS  with  unsubtracted  dynamics  and  hyperbolic  tangent  output 

2 

function,  the  stationary  point  of  Theorem  4  is  stable  if  N>4.  g>l/4,  and  |  r  |  <  N  /4. 

a» 1 


The  conditions  of  Theorems  4  and  5  are  easily  satisfied  in  practice.  Hence  we  have 
the  result  that  our  DLS  has  a  stable  point  in  each  Hadamard  corner,  for  a  large  range  of 
thresholds  that  straddle  the  origin.  Theorems  4  and  5  are  in  agreement  with  Theorem  3  for 
gains  g>50.  The  bounds  used  for  the  derivation  of  Theorem  3  are  very  sloppy,  and  have 
been  chosen  in  order  to  keep  down  the  analytical  work.  (91)  may  be  replaced  by  a  less 
stringent  condition  by  using  tighter  bounds.  In  any  case,  Theorems  4  and  5  by  themselves 
suffice  to  assure  a  stable  point  in  every  Hadamard  corner. 


ENERGY  LANDSCAPE 

The  energy  provides  a  simple  and  natural  way  to  visualize  the  dynamics.  The 
energy  is  a  scalar  function  in  signal  space,  and  by  ignoring  N— 2  dimensions  in  some  vague 
way  we  can  imagine  the  energy  function  as  a  surface  in  three  dimensional  space.  A  point  on 
the  surface  depicts  the  signal  y  as  the  projection  on  the  horizontal  plane,  and  the  energy 
E(y)  as  the  height  of  the  point.  The  dynamics  drives  the  signal  point  down  the  energy 
surface,  but  generally  not  along  the  steepest  path.  This  may  be  seen  as  follows.  From  (20) 
one  has 

5aE  =  -Vs-Vs'(va>'  (128> 

hence,  in  signal  space  the  state  changes  in  time  by 

ya  =  _^aE)/s'(va)'  (129) 

Unless  the  derivatives  s'(va)  have  the  same  value  for  all  a,  the  direction  of  the  signal 

velocity  y  differs  from  the  direction  of —grad  E.  In  the  proportional  region,  these  directions 
are  the  same  because  s'(va)=g  ,  V  a.  Hence,  in  the  proportional  region,  the  signal  point 

moves  down  the  energy  surface  along  the  steepest  path. 

We  explore  the  energy  landscape  for  the  case  of  maximum  symmetry, 
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r  =  0  ;  (130) 

application  of  a  nonzero  vector  r  would  amount  to  tilting  the  energy  surface,  and  this 
would  cause  a  change  in  the  stationary  points  that  is  easily  visualized.  With  (130),  the 
energy  (22)  is 


EW  =  -5Sab/yV  +  fi.  (131) 

We  restrict  signals  to  lie  in  the  proportional  region.  With  R  the  Euclidean  norm  of  the 
signal,  as  given  by  (33),  one  then  has 

n  =  R2/(2g)  (132) 

exactly  for  the  piecewise  linear  output  function,  and  approximately  for  the  hyperbolic 
tangent,  as  shown  in  Appendix  C.  Hence,  for  either  output  function  we  have 

E  =  - 1  Sabcyaybyc  +  R2/(2g)  •  (133) 

A  few  results  may  be  derived  without  specifying  the  form  of  the  connection  tensor  S  ^  , 

o 

beyond  symmetry.  Writing  d  for  d/dy  ,  we  have,  from  (133), 

d 


3aE  =  -  SabcybyC  +  V*  . 
VaE  “  -2  Sab/  +  W«  • 


(134) 

(135) 


At  stationary  states  one  has  9  E=0,  so  that  (134)  gives 

ct 

ya=  sSabcybyC ' 

This  equation  has  the  solution 

y=o , 

and  perhaps  other  solutions  as  well,  which  we  denote  by  y 

r 

(137)  is  the  origin;  at  that  point,  (135)  gives 


(136) 

(137) 

The  stationary  point  given  by 


W  =  {ba/S  • 


(138) 


Since  the  tensor  is  positive  definite,  we  have 

Theorem  6:  For  a  continuous  DLS  with  r  =  0  ,  and  a  neuron  output  function  which  is 
either  a  hyperbolic  tangent  or  a  piecewise  linear  function,  the  origin  is  asymptotically 
stable. 


In  order  to  investigate  the  stability  of  the  other  stationary  points,  y,  we  resort  to  a 

r 

trick  that  allows  efficient  use  of  equation  (136).  For  a  radial  displacement 


25 


<5ya=^ya  (139) 

one  gets 

62 E  =  5yiSybdbdaE  =  («>.)2(-2yaSabcybyc  +  R2/g)  ,  (140) 

where  (135)  and  (33)  have  been  used.  In  the  S  term  equation  (136)  can  be  used  readily, 
with  the  result 

62E  =  (^)2(— 2yaya  +  R2)/g  •  (141) 


Using  (33)  once  more,  (141)  may  be  written  as 

£2E  =  — (6M)2R2/g.  (142) 

2 

Since  the  second  variation  6  E  is  negative,  we  have 

Theorem  7:  For  a  DLS  with  r=0,  and  an  output  function  that  is  either  a  hyperbolic  tangent 
or  a  piecewise  linear  function,  the  stationary  states  y  in  the  proportional  region,  but  away 

r 

from  the  origin,  are  unstable. 

Calculation  of  the  stationary  points  y  requires  that  the  connection  tensor  be 

r 

specified.  For  the  unsubtracted  tensor  (14),  equation  (136)  takes  the  form 

V=  SN  l  hal  ya2  .  (143) 


The  equations  can  be  decoupled  by  means  of  a  Hadamard  transform,  with  the  result 

ya=  gv/N3ya2  •  (144) 

This  implies  that 

either  ya  =  or  =  0  ■  (145) 

Denote  by  A  the  set  of  all  indices  a  for  which  y^  ^0, 


A  =  W  ya*o} . 


By  (145),  the  solutions  y  of  (143)  away  from  the  origin  may  then  be  written 

I* 

if  oi  e  A,  yQ  =  l/(gVN3)  ,  else  0  . 

Using  (50),  the  norm  R  of  y  is  found  to  be 

r  r 

R  =  ym/(g^N3)  . 

r 

where  m>0  is  the  cardinality  of  the  index  set  A. 

For  any  signal  y,  the  spectrwnu j)  is  defined  as  the  normalized  Hadamard 


(146) 


(147) 


(148) 
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components, 

=  ya/R ,  (149) 

where  R  is  the  Euclidean  norm  of  y  given  by  (50);  one  has 

$%a  =  1  ■  (150) 

The  unstable  stationary  points  y  of  (147)  have  the  spectrum 

r 

if  a  e  A,  <j>Q  =  1/v/m  ,  else  0  .  (151) 

The  spectra  given  by  (151)  play  an  important  role  in  the  theory;  we  call  them  principal 
spectra.  The  integer  m  is  called  the  order  of  the  principal  spectrum. 

The  principal  spectra  of  order  one  have  a  unit  vector  for  <J)  .  We  call  such  spectra 

pure\  they  have  signals  that  are  proportional  to  ?.  Hadamard  vector.  The  principal  spectra 
of  order  two  have  two  components  equal  to  1/^2  and  all  other  components  zero;  they  have 
signals  y  that  are  proportional  to  the  sum  of  two  Hadamard  vectors.  The  number  of 

principal  spectra  of  order  m  is  C(N,m)  —  •  The  total  number  principal  spectra 

N 

is  2  —1,  which  is  nearly  as  large  as  the  number  of  corner  points  of  the  solid  hypercube. 

The  set  of  unstable  stationary  points  y  given  by  (147)  is  called  the  ridge  set ,  since 
they  lie  on  the  ridge  of  the  central  crater.  For  ease  of  reference  we  state  the  result 

Theorem  8:  The  ridge  set  has  principal  spectra. 


The  spectrum  of  y  may  be  seen  as  a  unit  vector  along  y  referred  to  the  Hadamard 
base.  It  is  of  interest  to  write  (149)  as 

y0=*V  (152) 

and  express  the  energy  in  terms  of  R  and  $  .  For  the  unsubtracted  dynamics  the  energy 
(133)  then  takes  the  form 

E(R,<tg  =  -  j  W3  R3  G  +  R2/(2g)  ,  (153) 

where 

G  =  pa3.  (154) 

Expression  (153)  for  the  energy  can  be  used  for  the  further  exploration  of  the  energy 
landscape,  in  two  ways.  One  way  is  to  fix  the  <}>  and  consider  E  as  function  of  R.  This 

amounts  to  seeing  how  the  energy  changes  along  a  ray  through  the  origin.  The  direction  of 
the  ray  is  set  by  the  spectrum  (J)^.  In  the  second  method  the  radius  R  is  fixed,  and  we 

regard  E  as  function  of  the  (f)^.  This  function  shows  how  the  energy  varies  over  the 

hypersphere  H^  centered  at  the  origin.  Together  these  two  cuts  provide  a  complete  picture 

of  the  salient  features  of  the  energy  landscape  in  the  proportional  region.  We  proceed  with 
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the  first  cut. 

For  fixed  G,  the  function  E(R)  of  (153)  has  a  minimum  at  R=0,  and  a  maximum  at 

R  =  R  =  l/(gv^3  G) .  (155) 

m 

The  R=0  minimum  is,  of  course,  the  stable  point  at  the  origin  found  before.  The  energy  at 
the  maximum  is 

E(R)  =  l/(6g3  N3  G2)  .  (156) 

m 


Q 

E(R)  is  zero  at  R  =  ~  R  .  It  follows  that  there  is  a  central  crater  surrounded  by  a  ridge; 

farther  out  the  energy  decreases  to  negative  values.  The  ridge  height  given  bv  (156) 
depends  on  the  value  G,  which  by  (154)  depends  on  the  direction  of  the  ray  (152).  Hence, 
the  ridge  is  undulated,  so  that  it  has  passes  and  peaks.  The  stationary  points  on  the  ridge 
have  principal  spectra,  by  Theorem  8.  For  the  principal  spectra  we  have 


G  =  l/Jm  .  (157) 

Using  (157)  in  (155)  and  (156)  gives,  for  principal  spectra, 

R  =  v/m/tev'N3)  ,  (158) 

m 

and 

E(R)=m/(6g2N3)  ,  (159) 

m 

which  shows  that  the  ridge  can  be  passed  easiest  for  m=l,  i.e.,  along  a  ray  pointing  to  a 
Hadamard  corner.  Also,  among  the  principal  spectra,  the  pass  is  located  nearest  the  origin 
along  those  directions. 

We  need  to  be  concerned  that  the  points  of  the  ridge  set  lie  indeed  in  the 
proportional  region.  For  the  piecewise  linear  output  function,  this  region  is  given  by 

|yal  <  i ■  (iso) 

For  the  ridge  set  we  have,  from  (147)  and  a  Hadamard  transform, 

=  ^2  c?A  haa  '  (161> 

Since  the  sum  over  m  Hadamard  vectors  has  components  which  have  at  most  the 
magnitude  m,  a  sufficient  condition  for  the  signal  y  of  (161)  to  lie  in  the  proportional 
region  is 

g  >  m/N2  .  (162) 

This  condition  is  satified  for  any  0<m<N  if 

g>l/N  ■  (163) 

For  the  hyperbolic  tangent  output  function,  the  proportional  region  is  smaller  than 
that  given  by  (160)  by  a  factor  that  depends  on  the  approximation  accuracy  required.  By 

(C9),  the  relative  accuracy  of  the  linear  approximation  is  about  y  /3,  where  y  is  the 


28 


maximum  magnitude  among  the  components  y  .  For  instance,  a  4%  accurate  linear 

d 

approximation  results  if  |y  |  <1/3,  V  a.  In  order  to  obtain  such  accuracy  for  the  ridge  set  it 
would  be  sufficient  to  replace  (163)  by 


g>3/N  .  (164) 

Conditions  (163)  and  (164)  are  easily  satisfied,  because  in  practice  we  want  large  gains  in 
order  that  the  DLS  be  fast. 

In  practice,  the  central  crater  is  very  small.  For  example,  for  N=16  and  g=50,  the 
radius  Rm  ranges  from  1/3200  to  1/800,  as  m  ranges  from  1  to  16.  These  radii  should  be 

compared  to  the  radius  R=16  for  points  of  the  I^g. 


Next,  we  take  the  hyper  spherical  cross  section.  On  any  hypersphere  Hj^  of  points 


y 


a' 


yaya  =  R2.  (165) 

which  lie  in  the  proportional  region,  the  stationary  points  of  E  given  by  (153),  for  fixed  R, 
are  the  stationary  points  of  G,  subject  to  the  subsidiary  condition  (150).  Those  points  can 
be  determined  by  using  a  Lagrangian  multiplier, 


o  =  3a  (G  +  A  (Aa-1))  =  3<t>a2+2A<t>tt 


(166) 


The  multiplier  A  is  calculated  by  multiplying  with  (j)a  and  summing  over  a;  the  result  is 

A  =  -|G.  (167) 

Substitution  in  (166)  gives 

0  =  ♦A-0)  ■  <168> 

and  it  follows  that  for  a  stationary  point  of  the  energy  on  the  hypersphere  there  is  an 
index  set  A,  such  that 

if  a  e  A,  <J>a=G,  else  0.  (169) 

The  value  of  G  can  be  determined  from  (154): 

G  =  m  G3  ,  (170) 

where  m  is  the  cardinality  of  the  set  A.  The  solutions  of  (170)  are  G=0  and  G  =  ±  1/v/m  . 
It  follows  that  the  stationary  points  of  the  energy  on  the  hypersphere  have  the  spectra 

if  a  £  A,  <j>a=  l/Jm,  else  0, 

or  (171) 

if  a  £  A,  <j>a=  -l/\/m,  else  0  , 

where  A  is  any  index  set,  and  m  is  its  cardinality.  From  (171)  and  (151)  we  have 


Theorem  9:  For  a  DLS  with  r=0,  and  an  output  function  that  is  either  a  hyperbolic  tangent 
or  a  piecewise  linear  function,  the  stationary  points  of  the  energy  function  constraint  to  a 


hypersphere  Hj^  have,  in  the  proportional  region,  ±principal  spectra. 


It  can  be  shown  that  for  l<m<N  these  stationary  points  are  saddle  points.  For 
m=l,  i.e.,  the  pure  spectra,  the  positive  solutions  (171)  are  energy  minima,  and  the 
negative  solutions  are  energy  maxima.  For  m=N,  the  positive  solution  (171)  is  an  energy 
maximum,  and  the  negative  solution  is  an  energy  minimum. 


The  exploration  of  the  energy  landscape  so  far  was  restricted  to  the  proportional 

region.  Farther  out  in  signal  space,  near  the  boundaries  of  the  solid  hypercube  J^=[— 1,1]^, 

the  energy  surface  has  a  lip  which  turns  up  near  these  boundaries,  and  provides 
containment  of  the  state  point.  In  the  equations  of  motion  the  lip  corresponds  to  the  term 
— v  outside  the  proportional  region.  This  term  can  assume  any  value  in  balancing  the 

equations  of  motion  at  equilibrium.  Large  magnitudes  of  v&  correspond  to  the  large  slopes 

available  on  the  lip  of  the  energy  surface.  The  lip  structure  is  expressed  by  the  features  of 

2 

the  function  0  that  are  not  described  by  the  approximation  R  /(2g)  valid  in  the 
proportional  region. 


DECOMPOSITION  INTO  LONGITUDINAL  AND  TRANSVERSE  DYNAMICS 


It  is  useful  to  introduce  the  notions  of  longitudinal  and  transverse  parts  of  the 
equations  of  motion.  This  involves  defining  the  longitudinal  part  py  of  any  vector  p  in 

signal  space  as  the  part  of  p  in  the  direction  of  y,  and  the  transverse  part  p  as  the  part  of 

p  perpendicular  to  y.  One  has 


PX=P-P|  • 


(172) 


Application  to  the  activation  velocity  vector  v  defined  by  the  equations  of  motion  (34)  for 
unsubtracted  dynamics  gives  the  longitudinal  equation  of  motion 

v||a=_v||a+R^3Gya+r||a’  (173) 

and  the  transverse  equations  of  motion 

v  =  -  v  „  +  #3(y„2  -RGy  )  +  r  „  ,  (174) 

i.a  ia  Y  w  q  J  a'  ±a  v  ' 

where  G  is  given  by  (154). 

For  ease  of  discussion  we  write  the  transverse  equations  of  motion  as 


(175) 
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ADIABATIC  FAKE  DYNAMICS 

The  split  of  the  equations  of  motion  into  longitudinal  and  transverse  parts  may  be 
used  in  the  following  manner.  For  fixed  R,  the  transverse  equation  (175)  describes  the 
activation  velocity  on  a  centered  hypersphere  of  radius  R  in  signal  space.  Suppose  we 

constrain  the  signal  state  point  to  remain  on  H^,  while  allowing  the  transverse  dynamics 

given  by  (175).  The  state  point  on  will  move  to  the  transverse  equilibrium,  where  v^= 

0,  and 

vx  =  F  ,  (177) 

by  (175).  We  delete  the  longitudinal  equation  of  motion  (173)  from  the  dynamics  for  now. 
In  the  modified  dynamics,  we  fix  R,  and  wait  for  transverse  equilibrium  to  be  reached  on 
H^.  After  that  has  happened,  we  move  to  the  "next"  sphere  and  wait  for 

transverse  equilibrium  to  be  reached  on  that  sphere.  Then,  we  proceed  to  the  next 
hypersphere,  etc.  We  call  this  fake  dynamics  transverse  adiabatic  dynamics.  This  dynamics 
may  be  executed  either  forward  or  backward,  as  the  radius  R  of  the  hypersphere  is 
increased  or  decreased  in  succession.  The  solution  y  of  (177)  depends  on  the  parameter  R; 
the  path  y(R),  R  =  0  to  R,  is  called  a  transverse  adiabatic  path.  R  is  the  maximum  value  of 

m  m 

R  for  which  y  lies  in  the  solid  hypercube  J^. 


Let  the  signal  point  y  be  stationary  in  the  true  dynamics.  Choose  R  such  that  y  lies 
s  s 

on  Hr.  Then,  move  backward  through  the  transverse  adiabatic  path.  As  the  radius  R  is 

diminished,  the  signal  eventually  falls  in  the  proportional  region,  where 


(178) 


But  this  implies  that  vis  longitudinal,  i.e., 

va  =  0.  (179) 

With  (177)  it  follows  that  in  the  proportional  region  we  have 

F  =  0  .  (180) 

This  shows 


Theorem  10:  For  a  DLS  with  unsubtracted  dynamics,  let  y  be  a  stationary  point  of  the  true 

.  s 

dynamics,  and  let  P  be  the  transverse  adiabatic  path  through  y.  Then,  F  =  0  along  P  in 

s 

the  proportional  region. 


It  follows  that  there  is  a  correspondence  between  stationary  points  and  the  solutions  of  Eq. 
(180).  We  proceed  to  find  these  solutions,  for  the  simple  case  r  =  0.  Then  (176)  becomes 

Fa  =  Mya2  -  RCya)  (181) 


With  (152)  this  may  be  written 
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Fa=R2VN3(<fa2-G$a)  ,  (182) 

and  (180)  gives 

— G<J>tt=0-  (!83) 

But  this  is  the  same  as  condition  (168),  so  that  the  transverse  equilibria  on  coincide 
with  the  stationary  points  of  the  energy  function  constraint  to  the  hypersphere  Hj^  Of 
course,  such  coincidence  was  expected.  With  Theorem  9  we  have  the  result 

Theorem  11:  For  a  DLS  with  unsubtracted  dynamics  and  r=0,  the  transverse  equilibria  in 
the  proportional  region  have  ±principal  spectra. 

Each  of  the  solutions  of  (180)  corresponds  to  a  stationary  point  of  the  true 
dynamics,  by  Theorem  10.  A  stationary  point  y  corresponding  to  the  solution  y  of  (180) 

S  0 

must  lie  on  the  transverse  adiabatic  path  y(R)  through  y  .  Its  location  on  the  path  is  such 

0 

that  there  is  longitudinal  equilibrium,  as  stated  by  (173)  with  Vy  =  0, 

vi|a=RvW3Gya;  (184) 

r  has  been  set  to  zero,  as  before. 

For  r=0,  signals  with  principal  spectra  of  orders  one  and  two  have  transverse 
adiabatic  paths  that  are  straight  lines  through  the  origin,  as  will  be  discussed  in  the  next 
section.  For  these  cases,  the  location  of  the  stationary  point  y  on  the  transverse  adiabatic 

S 

path  (180)  may  be  determined  from  the  longitudinal  equilibrium  condition  (184).  Since 
v^=  0  along  the  radial  transverse  adiabatic  path,  (184)  gives 


V  =  Rv/N3Gy  . 

s  s 

(185) 

For  a  principal  spectrum  of  order  m,  we  have 

G=±l/v/m  , 

(186) 

and  (185)  becomes 

,  Rv/N3 

(187) 

A  solution  of  (185)  may  be  obtained  by  writing 

v  =ve  ,  y  =  Re  ; 

(188) 

s  s 


where  e  is  the  unit  vector  along  y.  Eq.  (187)  then  gives 


v=R2/N3G  , 


(189) 
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with  G=1  for  m=l,  and  G=l/v/*2  for  m=2.  For  the  hyperbolic  tangent  output  function  (18) 
the  function  v(R)  implied  by  R=s(v)  is 


fT3\  1  In  1  +  R 

V(R)- 5gln  i=R, 

(190) 

by  (C2),  so  that  the  solutions  of  (187)  must  satisfy  the  equations 

m=l  : 

Sglni±K  =  RW> 

(191) 

m=2  : 

5g'n  T=E  =  rV(N3/2)  . 

(192) 

The  stationary  points  of  the  true  dynamics  for  signals  with  principal  spectra  of  orders  one 
and  two  can  be  found  by  solving  Eqs.  (191)  and  (192)  for  R. 


CONSERVATION  OF  PRINCIPAL  SPECTRA 

We  have  seen  that,  for  unsubtracted  dynamics  and  r=0,  the  transverse  equilibria  in 
the  proportional  region  have  ±principal  spectra.  This  means  that  the  adiabatic  fake 
dynamics  in  the  proportional  region  conserves  ^principal  spectra.  Will  this  also  happen  in 
the  true  dynamics?  Let  <j>a  be  a  principal  spectrum  of  order  m.  Then, 

=  0  (1M) 

as  follows  from  (151).  The  importance  of  (193)  is  that  it  provides  the  possibility  of 
replacing  the  quadratic  term  in  the  equations  of  motion  by  a  linear  term.  For  unsubtracted 
dynamics  the  equations  of  motion  (34)  then  may,  with  (152),  be  cast  in  the  form 

V="v»  +  RV(N3/m)  +  ra  ,  (194) 


where  a  Hadamard  transform  has  been  used  to  write  <j)ft  in  terms  of  <j>a  .  Suppose  r  has  the 
same  spectrum  as  y;  then  we  may  write 


ra=c+a' 

(195) 

where  c>0  is  some  fixed  coefficient,  and  (194)  becomes 

va=-va  +  r2  V(N3/m)  4>a  +  c(J>a  . 

(196) 

In  the  proportional  region  we  have 

1  R  i 

v  =  -  y  =  —  ffl  , 
a  g  Ja  g  Ya ’ 

(197) 

so  that  (196)  may  be  written 

3jWa)  =  (-R+gR2y(N3m)+6c)<ta . 


(198) 
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This  equation  is  satisfied  if 

R=-R+gR2v/(N3/m)+gc  ,  (199) 

and 


(200) 


consistent  with  (193).  Hence  we  have  as  result 

Theorem  12  Let  $  be  a  principal  spectrum.  For  a  DLS  with  unsubtracted  dynamics  and 
r=c$,  the  principal  spectrum  $  is  conserved  in  the  proportional  region. 

Initial  conditions  and  c  can  be  chosen  such  that  the  function  R(t)  subject  to  (199) 
increases  monotonically.  Since  the  spectrum  is  constant  as  the  signal  point  traverses  the 
proportional  region,  the  path  is  a  straight  line  through  the  origin.  The  path  is  a  staight  line 
only  if  the  spectrum  is  ±principal.  Upon  leaving  the  proportional  region,  the  path  generally 
curves  away  from  a  straight  line,  because  of  distortions  in  the  spectrum  produced  by  the 
nonlinearity  of  the  output  function.  We  know  of  two  exceptions  to  this  behavior,  viz.,  the 
principal  spectra  with  m=l  and  m=2. 

In  the  former  case,  the  signal  is  proportional  to  a  single  Hadamard  vector,  and  all  4>a 

(the  spectrum  components  in  the  neuron  frame)  have  the  same  magnitude,  l/\/N.  The 
nonlinearity  of  the  output  function  s(.)  then  applies  uniformly  to  all  components,  so  that  in 
(196)  the  |  v  |  remain  equal  to  each  other.  It  follows  that  the  signal  path  remains  straight 

a. 

all  the  way  to  the  stationary  point  in  the  Hadamard  corner. 


For  the  other  case,  m=2,  the  spectrum  is  proportional  to  the  sum  of  two  Hadamard 
vectors,  say  and  h  , 

<t.a=(l/v/(2N))(h/,a+h7a)  ,  (210) 


so  that  the  components  i  are  2U(2N),  -2/v/(2N),  or  0.  A  zero  value  for  4>  gives  v  =0  by 

d  d  d 

the  antisymmetry  of  the  neuron  output  function  s(.),  and  that  is  consistent  with  (196).  The 
other  two  possible  values  of  d>  lie  symmetric  with  respect  to  0,  and  will  give  equal 

'  a 

magnitudes  for  the  corresponding  components  of  v  again  by  the  antisymmetry  of  s(.);  this 

is  consistent  with  the  equations  of  motion  (196).  It  follows  that  the  signal  path  remains 
straight  outside  the  proportional  region,  all  the  way  up  to  the  stationary  state.  For 
sufficient  gain,  the  stationary  state  will  have  components  0  and  ±A,  where  A  is  close  to  or 
equal  to  unity,  depending  on  the  output  function  used.  By  Theorem  3,  such  a  stationary 
state  is  unstable. 


SPURIOUS  STATES  SHIELD  SPOILS  EARLY  DYNAMICS 

The  forgoing  explorations  provide  an  orientation  and  preparation  for  the  main 
dynamics  problem  of  the  DLS:  With  the  initial  activation  reset  to  zero,  and  the  BLT 
output  u  presented  to  the  DLS  as  external  coupling,  at  what  state  will  the  DLS  settle?  We 
want  this  final  state  to  be  the  Hadamard  vector  that  is  dominant  in  u. 
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In  this  regard  we  need  to  be  concerned  about  the  term  (N  ^iN)hja  in  the 

2 

expression  (66)  for  r  .  This  term  has  the  large  magnitude  N  — 4N,  and  it  was  added  to  r  to 

shield  against  spurious  states,  as  can  be  seen  from  the  proof  of  Theorem  2.  The  term  tilts 
the  energy  surface  by  a  large  amount  in  the  direction  of  the  first  Hadamard  vector,  h^.  As 

a  result,  the  signal  state  point  y  will,  in  its  gradient  descent  down  the  energy  surface  (in 
the  proportional  region)  slide  over  the  side  of  the  gully  that  runs  towards  the  dominant 
Hadamard  point,  and  end  up  in  either  the  adjacent  gully,  or  in  a  gully  several  Hadamard 
vectors  over.  In  any  case,  the  state  point  will  have  left  the  gully  which  leads  to  the  correct 
Hadamard  point.  This  causes  the  DLS  to  settle  at  the  wrong  Hadamard  point.  The 
unwanted  effect  is  largest  close  to  the  origin,  where  the  undulating  features  in  the  energy 
landscape  are  subtle,  so  that  the  tilt  of  the  energy  surface  has  a  large  effect.  Hence,  the 

2 

large  threshold  term  (N  — 4N)h^a  ,  which  was  deployed  as  as  shield  against  spurious  stable 
states,  spoils  the  early  dynamics. 

What  is  to  be  done?  Either  we  have  the  protection  against  spurious  states,  and  the 
wrong  early  dynamics,  or  we  have  the  correct  early  dynamics,  but  face,  later  in  the  state 
development,  the  hazard  of  ending  up  at  a  spurious  state.  Under  the  circumstances,  we 

choose  the  latter.  The  troublesome  term  (N  — 4N)h.  in  the  vector  r  is  dropped,  and  we 

use  the  expression  (41)  or  (44)  for  ra  in  the  external  coupling  or  the  initial  value  coupling. 

In  either  scheme,  the  vector  r&  still  contains  a  term  that  is  proportional  to  <5ap  with 

magnitude  cN  in  the  external  coupling,  and  with  magnitude  c  in  the  initial  value  coupling. 
These  terms  cause  no  problems  with  the  dynamics,  because  they  provide  the  same  force  in 
the  direction  of  all  Hadamard  points. 

We  proceed  with  the  investigation  of  the  dynamics  with  this  arrangement. 


DYNAMICS  IN  THE  PROPORTIONAL  REGION 

For  unsubtracted  dynamics,  the  Hadamard  transform  of  the  equations  of  motion  in 
the  proportional  region  is  given  by  (48).  These  equations  are  uncoupled  and  they  can  be 
integrated  as  follows.  The  index  a  is  temporarily  suppressed.  (48)  may  be  written 

dy/(y-y+)  -  dy/(y-y_)  =  (y+-y_)  g^N3  dt ,  (211) 

where  y+  and  y_  are  the  roots  of  the  quadratic  form 

y2-y/(gv^3)+r/^3.  (212) 

We  recall  that  for  r  =  0  the  energy  landscape  features  a  central  crater  surrounded  by  an 
undulated  ridge.  Applying  ra=Mua+c^ai) in  the  external  couling  scheme  means  tilting  the 

energy  surface.  For  very  small  p  this  shifts  the  stable  stationary  point  away  from  the 
origin.  A  second  stationary  point  occurs  at  larger  radius,  and  this  point  is  unstable. 
Choosing  progressively  larger  values  tor  the  coupling  constant  p  makes  the  two  stationary 
points  come  closer,  coalesce,  and  disappear.  This  corresponds  to  the  discriminant  of  the 
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quadratic  form  (212)  becoming  smaller,  zero,  and  negative.  There  is  no  stationary  point  in 
the  proportional  region  iff  there  is  an  index  a  for  which  the  quadratic  form  has  complex 

roots;  there  is  then  no  point  at  which  y  =  0.  The  three  cases:  different  real  roots,  coinciding 
roots,  and  complex  roots  have  different  dynamics  and  must  be  considered  seperately. 
Starting  with  the  case  that  the  roots  y±  are  complex,  we  have 

„  -  r,  a±i7?  f 9.1  S'! 


where 


(211)  may  be  rewritten  as 


where 


y+=  pe 


P  =  v/r/N^ 


,  =  l/((2g,/r  N3/4)  . 


dy/(y-y+)  -  dy/(y-y_)  =  2ir/dt, 

1/  =  /(g2rVN3-l/4)  . 


(213) 

(214) 

(215) 

(216) 

(217) 


The  integral  of  (216)  with  initial  condition  y(0)=0  is 


which  may  be  written 


y  -  y ,  y_ 

ln(y^Z  = 


l  ~  *+  _  *+  jin 
y  -  y_  y_ 


This  gives 

1  -  e2it/t  e  1  e"1*4  eil4  -  e~i,n 

y  y+y-  y  _  y+e2'M  T+y~ y  e,,n-y  e-'14  y+y~ ei(')+l/t)  _  e-i(>l+< 


(218) 


(219) 


with  the  result 


sin  v  t 
_  _ a 

ya~  pasin(r7a+j/at)  ’ 


(220) 


where  the  index  a  has  been  reinstalled.  With  the  index  a  shown,  (214),  (215),  and  (217) 
read 


pa^ra/N‘ 


cos  T]a  =  l/(2g>/rQ  N3/4)  , 
va  =  V(g2ra\/N3  -  1/4)  . 


In  the  external  coupling  scheme  we  have 


r^  =  M'i„+cv7Nh„,), 


(221) 

(222) 

(223) 

(224) 


where  u  is  the  BLT  output  (3).  In  (224)  we  have  hQ^  =  l  ,  V  a.  In  the  interest  of  readability, 
we  will  drop  the  h^  and  write  (224)  as 
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ra=Mua+cVN)  ,  (225) 

committing  a  notational  sin  in  tensor  calculus  since  in  (225),  a  scalar  c^N  appears  to  be 
added  to  a  vector  ua-  Using  the  Hadamard  transform  of  (3)  in  (225)  gives 

ra=  /VN(ca+c)  ,  (226) 

where  cq  =  x.qa .  (227) 

Recall  that  x  is  the  SRM  input  vector,  and  the  are  the  bipolar  stored  vectors.  Use  of 
(226)  gives  for  (221),  (222),  and  (223) 

Pa  =  V(/4ca+c)/N)  ,  (228) 

cos  T)a=  l/(2gItyMca+c))  ,  (229) 

»a=  4g2N2MVc)  -1/4)  •  (230) 


So  much  for  the  case  that  the  quadratic  form  (212)  has  complex  roots.  Two  other  cases 
remain  to  be  investigated:  the  two  roots  coinciding,  and  two  different  real  roots. 

For  the  coinciding  roots,  the  discriminant 


D0  =  1/(62N3)  -  irJW3 

(231) 

is  zero;  this  implies 

A ca+c)=l/(4g2N2)  . 

(232) 

The  single  root  is 

y.  =  i/(2g*/N3) , 

(233) 

and  the  equation  of  motion  (48)  with  initial  condition  ya(0)=0  has  the  integral 

^a  ^do  ^  1+y  g  t/N3t^  ' 

(234) 

This  shows  that. 

CD 

t>0,  0<ya<yoD  =  l/(2gv/N3)  . 

(235) 

For  large  dimension  and  gain,  y^  is  small. 


Next,  we  consider  the  case  with  two  different  real  roots.  There  are  three  subcases: 
cft+c  <  0  ,  ca+c=0,  and  c  +c  >  0.  For  ca+c=0,  the  equation  of  motion  (48)  with  initial 

condition  ya(0)=0  gives  yft  =  0  for  all  times.  For  c^+c  <  0,  the  roots  are  written 


where 

and  T}'  is  determined  by 
With 


y+  =  p  e7^,  y_  =  -  p  e  ^  , 

p  =  V(-r)/N3/4  , 

sinh  T}'  =  l/(2gN3/4v/(-r)  . 


(236) 

(237) 

(238) 
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K'=  4l/4-rgW)  , 

the  equation  of  motion  (48)  with  initial  condition  y(0)=0  has  the  integral 

s  i  nh  v '  t 

y  =  -  P  — ; - .  • 

COSh(7)'  +  l/'t) 

For  large  t  we  have 

—77' 

y  -» — p  e  '  =  y__ . 


(239) 

(240) 

(241) 


Since  the  function  y(t)  of  (240)  is  monotone  for  t>0,  we  have 

t>o,  °^ya<JV  >  (242) 

where  the  index  a  has  been  reinstalled.  Use  of  (226)  gives 

y„_  =  (1/2  -  V(l/4-M(ca+c)g2N2)/(gyN3)  ■  (243) 

For 

-/z(ca+c)g2N2  <<  1  (244) 

wp  ha vp 

yct-  -  ^ca+c)gv/N  ,  (245) 

2  2 

which  is  small  for  large  dimension  and  gain,  considering  (244).  For  M(ca+c)g  N  of  order 

2  2 

unity,  y^_  of  (243)  is  small  for  large  g  and  N.  For  — ^z(ca+c)g  N  »1  we  have 

|y^I=4-Mca+c)/N),  (246) 

which  is  small  for  large  dimension.  By  (242)  it  follows  that  |y^|  always  remains  small  if  g 
and  N  are  large. 

Finally,  for  the  case  with  real  roots  y±  and  cft+c  >  0  we  have 


p=v/r/N3/4  , 

cosh  t y  —  l/(2gy/r  N3/4)  , 

=  V(l/4-rg2N3/2)  . 

The  integral  of  the  equation  of  motion  (48)  with  initial  condition  y(0)=0  is 


For  large  t  we  have 


(247) 

(248) 

(249) 

(250) 

(251) 
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y  -»  p  e  y_.  (252) 

Since  the  function  y(t)  given  by  (251)  is  monotone,  and  y(0)=0,  it  follows  that 

t>0,  0<yft<ya_=  (W(l^ca+c)S2N2)/(2gVN3)  ,  (253) 

where  the  index  a  has  been  restored.  Since  the  case  considered  has  two  different  real  roots 

2  2 

and  c^fOO,  we  have  0<4/z(ca+c)g  N  <1  in  (253),  and  it  follows  that 

0<l-V(l-4/4ca+c)g2N2)<l  .  (254) 

Hence,  y^  is  small  for  large  g  and  N. 

We  have  seen  that,  for  the  case  that  the  quadratic  form  (212)  has  real  roots,  |ya| 

has  a  bound  that  is  small  for  large  g  and  N.  Hence,  for  a  DLS  with  unsubtracted  dynamics, 
large  gain  and  dimension,  using  the  external  coupling  scheme,  the  signal  y  can  get 
substantially  away  from  the  origin  only  if  there  exists  an  index  a  for  which  the  roots  of  the 
quadratic  form  (212)  are  complex,  i.e. ,  if 

3  a  such  that  g2N2/x(cft+c)  >  1/4  ;  (255) 

for  all  such  a,  the  Hadamard  component  yQ  of  the  DLS  signal  y  is  given  by  (220).  We 

consider  here  only  cases  such  that  the  dominant  Hadamard  vector  in  the  BLT  output  u  is 
unique,  i.e., 

c(3>ca  •  V  •  (256) 

The  index  /?  for  which  (256)  is  true  is  called  the  dominant  index.  Ea.  (227)  shows  that  if 
two  coefficients  c^  are  different,  they  differ  by  at  least  2.  Since  c^ff— N,N]  because  of  the 

bipolar  nature  of  and  x  and  (227),  (256)  implies  that 

c^e[-N+2,N]  .  (257) 

It  fellows  that  the  condition  (255)  written  for  the  dominant  index  /?, 

g2N2/x(C/?+c)  >  1/4  (258) 

is  satisfied  if  we  choose 

c=N  ,  (259) 

and 

M  >  l/(8gV)  .  (260) 

For  our  DLS  to  work,  the  signal  y  must  traverse  the  proportional  region.  With  the  results 
of  this  section  this  means  that  in  the  proportional  region  and  for  large  dimension  and  gain, 
at  least  one  Hadamard  component  must  grow  according  to  (220).  Clearly,  one  of  these 
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Hadamard  components  is  the  dominant  Hadamard  component  y^  .  It  follows  that  for  the 
dominant  index  /?  the  roots  y  must  be  complex;  this  is  the  case  if  (260)  is  satisfied.  Hence 
we  have 


Lemma  5:  For  a  DLS  with  unsubtracted  dynamics,  externally  coupled  to  a  BLT,  with  large 
dimension  and  gain,  and  with  the  coupling  constant  satisfying  (260),  the  dominant 
Hadamard  component  grows  in  the  proportional  region  according  to  (220). 

So  far  in  this  section  we  have  considered  the  dynamics  in  the  proportional  region  for 
a  DLS  with  external  coupling  to  the  BLT  output  u.  We  briefly  consider  the  other  coupling, 
i.e.,  initial  value  coupling.  We  now  have  expression  (44)  for  the  vector  r.  The  Hadamard 
components  are 


ra=(c/v/N)h  Ql. 

(261) 

For  complex  roots  y  we  have  from  (216),  and  the  initial  value 

ya(°)=/«% 

the  solution 

(262) 

p  sin  ut  +  g^uasi  n(r^-^t) 
ya~  P  p  sin(77+i/t)  -  g/zu^sin  iA  ’ 

(263) 

where 

P=\/c/N  , 

(264) 

cos  7?-  2gv/(cN)  ’ 

(265) 

and  t/=v/(g^cN— 1/4)  . 

(266) 

The  condition  for  the  roots  to  be  complex  is 

c>l/(4Ng2)  . 

(267) 

The  roots  must  be  complex  in  order  that  the  signal  y  can  traverse  the  proportional  region. 
Condition  (267)  is  satisfied  if  we  choose 

c=l/(2Ng2)  . 

(268) 

For  this  value  of  c,  (264)  to  (266)  become 

p=l/(gv/(2N3)), 

(269) 

COS  T}=  1  /i/2  , 

(270) 

u=  1/2 

(271) 
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(270)  shows  that  77=  7r/4  ,  and  (263)  becomes 

(i/2— z  )  sin  vi  +  z  cos  v\, 
_  vv  a1  a _ 

y a~  ^  cos  ut  —  (z  —  1 )  sin  1A  ’ 
'  or  ' 


(272) 


where 


za—  gVav^(2N3)  . 


(273) 


There  is  a  time  t  such  that  the  function  y  (t)  given  by  (272)  is  monotone  and 


increasing  in  the  interval 


and 


0<  t  <  t 


Cud  ’ 


y(t  )=od 
aw' 


(274) 

(275) 


t^  decreases  for  increasing  z  .  Since  z^  is  largest  for  the  dominant  index  /?,  t  is  smallest 

for  yp.  It  follows  that  y^  given  by  (272)  increases  indefinitely  as  t  approaches  t^  .  Of 

course,  at  sometime  during  this  process  the  signal  y  leaves  the  proportional  region.  The 
singularity  at  t^  dominates  the  behavior  of  y  p  as  t  gets  close  to  t^.  This  results  in 

spectral  purification,  as  will  be  discussed  in  the  next  section.  We  note  here  a  disadvantage 
of  the  initial  value  coupling:  if  u^<0,  the  initial  signal  has  negative  y^,  and  it  takes  time 

for  the  y p  to  become  positive  (as  required,  because  we  want  the  signal  to  go  to  hg)  by  the 

action  of  the  threshold  term  r  given  by  (261).  The  external  coupling  does  not  have  this 
delay,  since  with  c  given  by  (259),  c^+c  in  (226)  is  always  positive,  so  that  the  y p  becomes 

positive  immediately  after  reset  of  the  DLS.  In  this  regard  it  should  also  be  noted  that  it 
takes  time  to  reset  the  activation  to  any  value,  be  it  /i(u  +Nc<5  -.)  or  zero,  because 

amplifiers  have  a  finite  slew  rate. 


SPECTRAL  PURIFICATION  IN  THE  PROPORTIONAL  REGION 

A  surprising  property  of  the  DLS  is  that  already  in  the  proportional  region,  long 
before  the  signal  y  comes  close  to  the  boundaries  of  the  solid  hypercube  Jjy  ,  the  spectrum 

gets  purified  towards  the  dominant  Hadamard  vector.  This  purification  is  due  to  the 
activation  nonlinearity  and  the  mathematical  nature  of  Hadamard  vectors.  The  spectral 
purification  plays  an  important  part  in  the  development  of  the  state,  which,  starting  from 
the  origin  (by  reset),  makes  its  way  towards  the  dominant  Hadamard  point  by  gradient 

descent  along  the  energy  surface.  Although  spectral  purification  occurs  also  for  initial  value 
coupling,  this  will  not  be  investigated  here,  because  of  the  practical  disadvantage  of  this 
coupling  as  noted  above,  and  this  report  is  getting  too  long.  For  the  external  coupling,  the 
BLT  output  u  is  coupled  to  the  DLS  by  means  of  the  term 

ra=M%+  Cv/N)  (276) 

in  the  Hadamard  transform  (48)  of  the  equations  of  motion;  unsubtracted  dynamics  is  used 
here.  The  BLT  output  u  is  given  by  (3),  with  the  Hadamard  components 
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u  =v/N  c 
a  y  a 


(277) 


where 

as  given  by  (4).  Remember  that 
(277)  and  the  choice  (259)  for  c, 


ccr*-%  ■  <278> 

qa  are  the  stored  vectors,  and  x  is  the  input  vector.  With 

(276)  gives 


ra=/^N(ca+N)  • 


(279) 


The  dominant  Hadamard  vector  in  u  is  the  Hadamard  vector  with  the  largest  coefficient  in 
the  Hadamard  expansion  of  u.  Hence,  P  is  the  dominant  index  in  u  iff 

C/3>ca  ,V^.  (280) 

We  can  also  consider  the  dominant  index  in  the  signal  y;  it  is  the  index  /,  'u-.h  that 

y/?>ya,Vo^'  (281) 

Both  (280)  and  (281)  imply  that  the  dominant  index  is  unique;  we  restricted  the  inputs  x 
such  tnat  this  is  the  case.  This  condition  excludes  any  input  vector  x  which  has  more  than 
a  single  nearest  stored  state. 

Since  for  early  times  the  signal  y  is  about  proportional  to  the  vector  r,  and  r  is  given 
by  (276),  the  dominant  index  in  u  is,  for  these  times,  the  same  as  the  dominant  index  in  y. 
Hence,  for  early  times,  i.e.,  close  to  the  signal  origin,  these  two  definitions  of  dominant 
index  may  be  interchanged. 


Let  P  be  the  dominant  index  in  y,  and  let  a  the  index  with  the  next  smaller 
Hadamard  component  c  .  We  call  the  ratio 

*  =  y0lya  (232) 

the  dominance  ratio.  The  dominance  ratio  changes  through  the  proportional  region.  As  the 
signal  y  traverses  this  region  as  described  by  the  equation  of  motion  (48),  the  norm  of  y 
increases,  until  the  outer  edge  of  the  region  is  reached,  say,  at  y^=K  >0.  The  value  of  K 

depends  on  the  other  Hadamard  components  y  ,  since  the  boundaries  of  the  proportional 

region  are  given  in  terms  of  the  y  ,  not  the  y  .  It  is  convenient  to  have  an  estimate  which 

does  not  equire  knowledge  of  the  other  Hadamard  components  y  For  the  case  of  a 

piecewise  linear  output  function,  such  an  estimate  is  given  by 

Lemma  6:  For  a  DLS  with  piecewise  linear  neuron  output  function,  let  the  signal  y  have  a 
unique  dominant  Hadamard  component  y^=K,  and  let  the  dominance  ratio  be  R.  ==> 

K<7N/(1+  . 

Proof:  By  the  inverse  Hadamard  transform  we  have 
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ya=(i/VN)  \aVa=(i c§phzo!'c)  ■  (283> 

With  y^=K  and  |ya|<IRK  ,  V  a±/3,  one  has  from  (283) 

|  y _  |  <  (l/\/N)(K+(N— 1)K/IR)  .  (284) 

For  the  piecewise  linear  output  function,  the  proportional  region  is  given  by 

I ya| <1  •  (285) 

Then,  (284)  implies  that  y  lies  in  the  proportional  region  if 

K<VN/(l-t — uj—  )  •  j  (286) 

We  are  interested  in  the  dominance  ratio  at  the  point  with  y^=K.  Different  cases 
need  to  be  considered.  First,  we  look  at  the  case  with 

C/?=— N+2  .  (287) 

Since  the  dominant  index  ft  is  assumed  to  be  unique,  and  c^  has  the  range  [— N,N],  one 
must  have 

ca=-N  ,  V  c4P  .  (288) 

At  this  point  we  choose  the  coupling  constant  as 

M=l/(4g2N2)  ;  (289) 

this  choice  satisfies  (260),  and  with  (259)  and  (287),  condition  (255)  is  satisfied  for  index  (5. 
It  follows  that  for  index  (3  the  roots  y±  are  complex,  so  that  the  development  of  y^  in  time 

is  described  by  (220).  But  for  the  other  indices,  (288)  and  (259)  show  that  ca+c=0,  so  that 

ya=0,  V  t>0.  It  follows  that  in  this  case  the  dominance  ratio  1R  is  infinite,  V  t>0  and,  of 

course,  y  in  the  proportional  region.  Hence,  in  this  case  the  spectrum  is  pure  from  the  start. 

The  next  case  considered  has 

c^=— N+4  .  (290) 

If  the  next  smaller  Hadamard  coefficient  is  ca=-N,  then  the  considerations  for  the  previous 

case  apply  and  we  have  5?=®,  V  t>0  ,  in  the  proportional  region.  If  the  next  smaller 
coefficient  after  c^  is 

cq=— N+2  ,  (291) 

then  both  y ^  and  yft  are  given  by  a  formula  of  the  type  (220),  and  we  have,  with  y p=K  , 

R  =  K/ya=K  sin(^a+J/at^(pasin^at)  ’ 


(292) 
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where  p  ,  77  ,  and  u  are  given  by  (228)  to  (230),  with  c=N;  hence 

Pa=v/(Mca/N+l)  , 

cos  r?a=l/(2gNv/(//(ca+N))  , 


and  i/a=V(g2N2/i(ca+N)-l/4)  . 

For  the  coupling  constant  given  by  (289)  we  have  from  (293)  to(295) 


4ca/N+l) 
pa-~2gN  ’ 

c°S7?a=l/v/(ca+N)  , 
and 

I/a=(1/2)'^ca+^-l)  ' 


(293) 

(294) 

(295) 

(296) 

(297) 

(298) 


As  the  gain  g  or  the  dimension  N  is  increased  indefinitely,  pft  of  (296)  tends  to  zero. 
Then,  the  equation  K=y^  ,  for  y^  given  by  (220),  viz., 

sin  1/ pt 

K  =  P/3  sin^+^t) 

can  be  satisfied  only  if  the  denominator  goes  to  zero,  i.e.,  if 

Vp+vpt  -•  *  • 

With  (290),  (297),  and  (298),  the  statement  (300)  gives 

t-  4tt/(3v/3)  . 

Using  this  result,  together  with  the  p  ,  77  ,  and  u  given  by  (296),  (297),  and  (298)  for  the 
cft  of  (291)  gives  for  the  dominance  ratio  (292),  after  a  short  calculation, 


(299) 

(300) 

(301) 


0fcl.38KgN3/2  ,  (302) 

if  either  g  or  N  is  large.  Using  Lemma  6  with  the  equal  sign  together  with  (302)  gives  the 
result 


large  g  or  large  N  ,  IR~1.38g^2-N+l  .  (303) 

Hence  in  this  case  the  dominance  ratio  at  the  signal  point  with  y^=K  can  be  made  as  large 
as  desired  by  choosing  either  N  or  g  large  enough. 


Next,  we  consider  the  case  with 
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c^=— N+A  ,  A>6  ,  (304) 

The  smallest  dominance  ratio  results  if  the  next  smaller  coefficient  c^  is 

ca=c0~ 2  •  (305) 

For  fixed  A>6,  r?^,  T)a,  u^,  and  ua  do  not  depend  on  N  or  g.  As  g  or  N  is  increased 

indefinitely,  tends  to  zero,  and  we  must  have  (300),  as  before.  Statements  (301),  (302), 

and  (303)  go  through  with  different  numerical  factors;  the  important  point  is  that  these 
factors  do  not  depend  on  g  or  N.  The  cases  considered  exhaust  all  possibilities  for  smallest 
[R.  Hence  we  have 

Theorem  13:  For  a  DLS  with  unsubtracted  dynamics,  externaly  coupled  to  a  BLT  with 

2  2  2 
r  =/z(u  +N  <5  A  and  a  coupling  constant  fx  =  l/(4g  N  )  ==>  the  dominance 

a  a  El7 

ratio  1R  at  the  outer  edge  of  the  proportional  region  can  be  made  arbitrarily  large  by  taking 
either  the  gain  g  or  the  dimension  N  large  enough. 


FINAL  PURIFICATION 

After  traversing  the  proportional  region,  the  activation  enters  the  region  where  the 
nonlinearity  of  the  neuron  output  function  is  important.  By  Theorem  13,  the  dominance 
ratio  at  the  edge  of  the  proportional  region  can  be  made  arbitrarily  large  by  choosing  a 
large  gain  or  dimension.  What  happens  to  the  dominance  ratio  in  the  region  where  the 
nonlinearity  in  the  output  function  is  important?  We  have 


Theorem  14:  For  a  DLS  with  unsubtracted  dynamics  and  a  piecewise  linear  neuron  output 

2 

function,  externally  coupled  to  the  BLT  output  u  by  r  =//(u  +N  5  A  with  coupling 

a  a  al 7 

2  2 

constant  /x  =  l/(4g  N  )  ,  and  with  the  activation  reset  to  zero  at  the  time  of  application  of 
u,  the  signal  y  settles  at  the  dominant  Hadamard  vector  in  u,  if  the  gain  g  is  large  enough. 


Proof:  For  unsubtracted  dynamics  we  have  the  equations  of  motion  (34): 

- \  +  N  l  haa*a2  +  V<.+NV  '  (*“> 

where  (41),  (3),  and  (259)  have  been  used.  Showing  the  /?  term  in  the  sum  separately,  this 


is  written  as 


Vs  -  va  +  N  (h/3a  yp  +  y^a2)  +  ^Va+^al)  ' 
Similarly,  we  write 

ya  =  am  h  Va  =  ww  o Wp + dp  • 


(307) 

(308) 


There  are  separate  arguments  for  a=l  and  a^l.  We  start  with  the  case  a^l.  (307)  gives 


(309) 
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With  the  choice  (289)  for  the  coupling  constant,  one  has  for  the  last  term 

’  (31°) 

since  |  c  |  <N  ,  by  (278),  and  the  bipolar  nature  of  the  vectors  x  and  q  .  It  follows  that  the 

lMhaacal  can  be  made  arbitrarily  small  by  choosing  the  gain  large  enough.  Next,  consider 
2 

the  term  h^y^  in  (309).  Let  y  be  at  the  edge  of  the  proportional  region.  Then, 
y^=K,  and  |ya|  <K/IR  ,  V  at/3  ,  and  it  follows  that 


lQ?/)haa5'a2IS(N-1)KW.  (311) 

By  Theorem  13, 1R  can  be  made  arbitrarily  large  by  choosing  g  large  enough.  It  follows  that 
the  term  baayc*2  tends  t0  zero  as  8  g°es  t0  inanity. 

Since  at  the  edge  of  the  proportional  region  we  must  have 

I  val  <!/g,  Va,  (312) 

the  va  also  tends  to  zero  as  g  goes  to  infinity. 

It  follows  that,  for  fixed  dimension  N  and  a  fixed  activation  v  at  the  edge  of  the 
proportional  region,  we  have 

a^>  VNh/?a y/’  as^®-  (313) 

We  still  need  to  consider  a=l.  The  equation  of  motion  (306)  gives 

1=1  ■  ',l=-vl  +  N§haiyQ2+(hQlca+N2)-  (314) 


For  the  last  term  one  has 

lh“lca+N2)l  =  l  gV^2’  (315) 

provided  that  the  dominant  index  is  unique;  the  proof  is  left  to  the  reader.  It  follows  that 


v^>0  ,  V  t>0  , 

and  for  the  stationary  point 

(316) 

vi>Nyaya+2  ■ 

(317) 

From  (308)  we  have 

1  0  as  8^  ®  , 

(318) 

because  of  Theorem  13.  Hence, 
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ya ^(1/>/N)h/3ay/3  =(K/VN)h^a  ,  as  gH  m  .  (319) 

At  the  edge  of  the  proportional  region,  3  b  such  that  |  |  =1.  By  (319)  one  has,  for  this 

index  b, 


so  that 

Hence,  we  have  from  (319) 

(322)  implies 

and 


1= 1 yb 1 

=K/v/N  ,  as  g-^  00  , 

(320) 

KVN  , 

as  g-4  od  . 

(321) 

Vh/Ja 

,  V  a,  as  g-*  od  . 

(322) 

|yaM 

,  V  a,  as  g-t  od  . 

(323) 

(324) 

For  a  piecewise  linear  output  function,  the  signal  y  does  not  always  describe  the 
state  unambiguously,  and  we  then  need  to  work  with  the  activation  v.  From  (324)  we  have 

a*1,  va-N2h^a  ,  as  g-*  ®  .  (325) 

For  large  g,  (322)  shows  that  the  activations  v  for  a^l  all  have  about  the  same 

a. 

value,  1/g.  By  (313),  the  same  the  time  derivatives  v  ,  a^l.  all  have  the  same  magnitudes, 
and  their  signs  are  the  same  as  the  signs  of  y  .It  follows  that  shortly  after  the  activation  v 

St 

leaves  the  proportional  region,  all  components  v  ,  a^l  are  still  about  the  same,  and  all 

3# 

exceed  1/g  in  magnitude.  Hence,  at  that  time  we  have 


a*l,  ya=h/?a  exactly  •  (326) 

Eq.  (313)  remains  true,  so  that  for  a^l  the  changes  of  v  brought  about  by  v  of  (313) 

d  cl 

always  have  the  same  sign  as  va,  and  ya=h^a  does  not  change  at  all  (remember  s(.)  is  the 
piecewise  linear  function).  Hence,  for  a^l  a  stationary  point  is  reached  with 

-^g"33-  (327) 

For  a=l,  we  have  for  the  stationary  state,  by  (317), 

vl>Nyaya+2-Ny/?2+2=N2+2  ,  asg-*,,  (328) 

o 

by  (324).  Since,  for  large  g,  N  +2>  1/g  ,  we  have  for  the  stationary  point 

y1  =  l  ,  fur  large  g. 


(329) 
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(327),  (322),  and  (329)  show  that,  for  large  g,  the  state  moves  to  the  stationary  point 


^a  h/?a 


Va.  J 


(330) 


2  2 

Two  comments  are  in  order.  First,  the  condition  /z=l/(4g  N  )  on  the  coupling 

2  2  2  2 

constant  can  be  relaxed  to  an  inequality  l/(4g  N  )  </z<  c/(4g  N  )  ,  where  c>l  is  some 
number  that  can  be  determined  with  further  effort.  This  relaxation  of  the  coupling  constant 
condition  is  important  in  practice,  where  we  want  a  margin  for  structural  parameters.  We 
have  not  used  an  inequality  for  /z  in  order  to  keep  the  proof  simple.  The  estimations  used 
can  be  sharpened  considerably;  the  sharper  the  estimates,  the  larger  c  can  be. 

The  second  comment  is  of  a  more  serious  nature.  However  much  the  estimates  used 
in  the  proof  of  Theorem  14  are  sharpened,  the  maximum  coupling  constant  /z  allowed  is 

2  2  —7 

expected  to  be  still  small,  because  the  basic  factor  l/(4g  N  )  is  so  small  (3.9x10  for 
N=16  and  g=50).  The  practical  problem  with  such  small  coupling  constants  is  that  they 
make  the  DLS  exceedingly  slow,  because  a  very  long  time  would  be  needed  to  develop  the 
DLS  activation  to  appreciable  levels,  after  reset  to  zero  at  the  time  that  the  BLT  output  u 
is  applied.  In  practice  we  need  large  gain  and  a  coupling  constant  of  the  order  of  unity,  for 
the  sake  of  speed.  Moreover,  the  important  practical  applications  of  the  SRM  are  expected 
to  have  large  dimension  N.  It  is  clear  that  for  such  fast  SRMs  Theorem  14  does  not  provide 
an  assurance  of  perfect  associative  recall.  Numerical  computations  have  shown  excellent 
performance  for  N=16  and  coupling  constants  as  large  as  0.2,  but  this  does  not  imply  that 
such  machines  with  much  larger  dimension  would  work  for  practical  values  of  the  coupling 
constant. 

The  basic  problem  is  that  the  spectral  purification  processes  captured  by  Theorems 
13  and  14  do  not  describe  the  powerful  purification  that  goes  on  in  a  DLS  for  signals  close 
to  the  boundary  of  the  solid  hypercube.  Such  purification  is  observed  clearly  on  the 
computer.  A  strong  effort  is  needed  to  study  this  point  and  to  provide  a  theorem  that 
covers  practical  values  of  the  coupling  constant. 


NUMERICAL  COMPUTATIONS 

Computations  have  been  carried  out  to  investigate  the  associative  recall  of  SRMs  of 
dimensions  N=8  and  16.  In  these  computations,  a  bipolar  vector  x  is  presented  to  the  SRM 
front  stage,  and  the  output  x'  of  the  SRM  is  compared  with  the  stored  vectors  q^  ,  a=l  to 

N  .  The  SRM  has  perfect  associative  recall  if  for  every  bipolar  input  vector  x  the  output  x' 
is  the  stored  vector  nearest  x,  provided  that  the  nearest  stored  vector  is  unique.  The 

latter  condition  is  accounted  for  by  letting  the  computer  skip  input  vectors  x  which  have 
multiple  nearest  stored  vectors.  With  these  exceptions,  the  test  of  associative  recall  is 

applied  to  all  vectors  x  of  the  N  dimensional  hypercube  I^={— 1,1}^. 

The  N  stored  bipolar  vectors  qft  were  chosen  at  random,  but  were  kept  fixed  during 

the  run  of  x  over  the  1^.  In  the  early  stages  of  the  project,  before  the  theory  was 

adequately  developed,  we  had  some  notions  that  complelled  us  to  store  only  N— 1  vectors, 
which  were  subject  to  the  condition  that  their  first  component  always  be  +. 

The  SRM  front  stage  is  the  BLT,  which  in  the  computations  performs  a  linear 
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transformation  of  x  by  the  matrix  B.  as  shown  in  (1).  The  matrix  B  is  the  sum  of  outer 

products  of  the  stored  vectors  and  their  labels,  whicn  are  Hadamard  vectors.  The  label  for 

the  stored  vector  q  is  the  Hadamard  vector  h  .  The  BLT  output  is  the  vector  u.  In 
a  a  r 

calculating  u,  the  response  time  of  the  BLT  amplifiers  has  been  neglected.  This  was  done  to 
speed  up  the  computations,  and  because  a  finite  response  time  is  not  expected  to  change 
the  results  of  the  associative  recall  computations.  The  basis  for  this  expectation  is  the 
absence  of  feedback  in  the  BLT;  there  is  no  settling  of  u  other  than  the  smooth  growth 
from  zero  to  the  full  output. 

The  BLT  output  u  is  presented  to  the  rear  stage,  the  DLS,  for  further  processing.  In 
the  computations,  both  coupling  schemes,  the  initial  value  coupling,  and  the  external 
coupling,  have  been  used.  In  early  stages  of  the  project  we  used  initial  value  coupling  out  of 
fear  that  external  coupling  would  make  spurious  states  possible.  This  is  ideed  the  case,  but 
we  have  since  found  out  that,  although  spurious  states  exist,  they  are  dynamically 
inaccessible  in  the  external  coupling  scheme,  provided  that  the  coupling  constant  and  gain 
are  chosen  properly.  The  initial  value  coupling  was  first  applied  with  very  small  coupling 

—7 

constants,  typically  10  .  This  coupling  constant  is  just  large  enough  to  place  the  initial 

activation  outside  of  the  central  crater,  so  that  the  state  does  not  fall  back  to  the  stable 
origin.  For  N=16,  perfect  associative  recall  was  found.  However,  the  computations  were 
very  slow  because  of  the  small  coupling  constant.  In  practice,  this  would  show  up  as  a  slow 
settling  of  the  DLS.  Moreover,  in  practice  the  weak  initial  value  coupling  would  be 
vulnerable  to  noise.  It  was  decided  to  diminish  this  vulnerability  by  increasing  the  coupling 
constant.  It  was  found  that  then  the  gain  must  be  diminished,  in  order  to  keep  the  recall 
perfect.  The  small  gain  was  found  to  slow  down  the  DLS  .  In  response  to  these  difficulties, 
we  reexamined  the  external  coupling,  and  started  bearing  down  on  the  question  of  dynamic 
accessibility  of  the  spurious  states  which  are  brought  into  existence  by  removing  the  safe 

threshold  (the  term  (N2-4N)hla  in  (66)). 

Extensive  computations  were  done  for  the  external  coupling  scheme.  The  strength  of 
this  coupling,  i.e. ,  the  coupling  constant  n  in  (411  affects  the  settling  speed  of  the  DLS,  in  a 
similar  vay  as  for  the  initial  value  coupling.  In  this  regard,  Theorem  14,  which  guarantees 
that  the  DLS  works  if  the  gain  is  chosen  large  enough,  is  subject  to  the  condition  (289)  on 
fx,  and  the  coupling  constant  values  allowed  are  much  too  small  in  practice.  Therefore,  the 
numerical  computations  were  done  with  much  larger  coupling  constants,  typically  0.2. 
Perfect  associative  recall  was  found  for  these  SRMs,  with  gains  of  50.  We  were  interested  in 
the  spectrum  purification,  as  the  state  developed  from  reset  at  zero  to  the  dominant 
Hadamard  corner.  A  very  strong  purification  was  noticed  upon  entering  the  partially 
supercritical  region.  The  entering  of  this  region  was  easily  spotted  when  the  piecewise 
linear  output  function  was  used,  since  then  the  enirely  subcritical  region  is  given  by 
I  val  <l/6>  where  g  is  the  gain.  The  strong  purification  observed  in  the  supercritical  region 

is  not  exposed  by  Theorems  13  and  14,  and  hence,  these  theorems  do  not  get  at  what  makes 
the  DLS  work  with  practical  values  of  the  coupling  constant. 

Besides  the  two  coupling  schemes,  a  choice  had  to  be  made  between  subtracted  and 
unsubtracted  connection  tensors,  and  between  using  y1  clamping  or  not.  The  computations 

range  over  a  number  of  cases,  but  do  not  cover  all  possibilities  because  of  the  computer 
time  involved.  The  associative  recall  computations  require,  for  each  input  vextor  x,  the 
numerical  integration  of  N  coupled  nonlinear  differential  equations  of  first  order.  For 

1 

N=16,  the  computation  runs  over  2  =64K  input  vectors,  although  some  of  these  are 
skipped  because  of  multiple  nearest  vectors. 

On  a  number  of  occasions  it  was  observed  that  y1  clamping  speeds  up  the  settling  of 
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the  DLS,  and  we  expect  this  to  be  true  in  general. 

The  numerical  integration  of  the  DLS  equations  of  motion  was  done  in  time  steps 

_5 

which  typically  had  the  duration  <ft=10  .  The  time  step  must  be  compared  to  the  RC 

time  of  unity  for  the  equations  of  motion  in  the  normalized  form  (15). 

We  need  to  discuss  the  cutoff  for  the  numerical  integration.  After  considerable 
experimentation,  we  settled  on  a  scheme  in  which  the  integration  is  terminated  if,  for  all 
indices  b=l  to  N,  { |  v^|  >5/g  and  |  v^(t)  |  >  |  v^(t— 10ft)  | }  .  The  factor  10  in  t — 10 ^t  was 

chosen  to  accomodate  fluctuations  in  stochastic  computations  which  use  the  same  computer 
program.  The  stochastic  computations  will  not  be  discussed  here,  but  will  be  reported 
separately. 

The  following  is  a  partial  list  of  associative  recall  computations  performed.  All  these 
runs  showed  perfect  associative  recall. 

1)  N=16,  initial  value  coupling  with  /r=lxl0  ,  unsubtracted  dynamics,  y^ 
clamping.  Hyperbolic  tangent  output  function  with  gain  of  10. 

2)  N=16,  initial  value  coupling  with  /x=lxl0  ,  subtracted  dynamics,  no  y^ 
clamping.  Hyperbolic  tangent  output  function  with  a  gain  of  10. 

3)  N=8,  initial  value  coupling  by  Eq.  (45)  with  /x=0.0001  to  0.1,  with  zero  threshold 
(c=0  in  Eq.  (44)).  piecewise  linear  output  function  with  gain  of  0.25.  Unsubtracted 
dynamics  with  y ^  clamping. 

4)  N=8,  external  coupling  by  Eq.  (41)  with  c=0  and  /z=1.0.  N  unrestricted  stored 
bipolar  vectors.  Unsubtracted  dynamics  without  y^  clamping.  Piecewise  linear  output 

function  with  gain  of  20. 

5)  N=16,  external  coupling  by  Eq.  (41)  with  c=0  and  fi=  0.5.  N-l  stored  bipolar 
vectors  with  first  component  +.  Subtracted  dynamics  with  y^  clamping.  Hyperbolic 

tangent  output  function  with  gain  of  50. 

6)  N=16,  external  coupling  by  Eq.  (41)  with  c=0,  and  [1=0.2.  N-l  stored  bipolar 
vectors  with  first  component  +.  Unsubtracted  dynamics  with  y^  clamping.  Hyperbolic 

tangent  output  function  with  gain  of  50. 

7)  N=16,  external  coupling  by  Eq.  (41)  with  c=0,  and  N-l  stored  bipolar 

vectors  with  first  component  +  .  Unsubtractea  dynamics  with  y^  clamping,  piecewise  linear 

output  function  with  gain  of  50. 

8)  N=16,  external  coupling  by  Eq.  (41)  with  c=N,  and  n=0.2.  N  unrestricted 
bipolar  stored  vectors,  piecewise  linear  output  function  with  gain  of  50.  Unsubtracted 
dynamics.  No  y^  clamping. 

9)  Same  as  8),  but  with  y^  clamping. 
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CONCLUSION 

The  main  result  is  contained  in  Theorem  14.  This  theorem  assures  perfect 
associative  recall  of  an  SRM  consisting  of  a  BLT,  externally  coupled  to  a  DLS  built  as  a 
quadratic  Hadamard  memory,  if  the  coupling  constant  is  as  specified,  and  the  gain  is  large 
enough.  The  theorem  holds  for  any  dimension  that  is  a  power  of  2.  The  theorem  is  derived 
for  unsubtracted  dynamics  and  a  piecewise  linear  output  function.  It  is  good  to  have  such  a 
theorem  that  is  valid  for  the  large  dimensions  that  are  expected  to  be  important  in 
applications.  The  condition  of  large  gain  in  Theorem  14  is  likely  to  be  met  in  practice, 
because  large  gains  are  desirable  in  order  to  have  a  fast  acting  DLS.  Of  course,  the  values 
of  the  gain  required  would  have  to  be  determined  by  sharpening  the  estimates.  The  small 
value  of  the  coupling  constant  /x  required  by  the  theorem  constitutes  a  problem  in  practical 
applications,  however.  Such  small  coupling  constants  would  slow  down  the  DLS  to  an 
unacceptable  degree.  Numerical  computations  of  the  signal  development  in  the  DLS  for 
coupling  constants  as  large  as  0.2  have  shown  that  it  is  not  necessary  to  have  a  large 
dominance  ratio  at  the  edge  of  the  proportional  region;  a  very  strong  spectral  purification 
occurs  in  the  final  stages  of  state  development,  outside  the  proportional  region.  This 
purification  is  due  to  the  clipping  done  by  the  output  function,  in  a  manner  that  is  not 
understood  at  present.  This  mechanism  is  not  captured  by  the  present  analysis. 

Theorem  1  excludes  limit  cycles  from  the  DLS  dynamics.  Theorem  2  together  with 
Theorem  3  essentially  show  that  no  spurious  stable  states  exist  provided  that  conditions  on 
thresholds,  gain,  and  dimension  are  met.  These  conditions  are  easily  satisfied  in  practice, 
but  the  thresholds  prescribed  are  inconsistent  with  the  requirement  of  gentle  nudging  of  the 
state  into  the  "correct"  gully  by  the  external  force,  after  reset  to  the  origin.  Theorem  2  is 
in  agreement  with  results  obtained  from  the  asynchronous  discrete  DLS  model  discussed  in 

[9] .  For  the  continuous  DLS  of  practical  applications,  the  large  threshold  term  N  -4N  of 
(66)  must  be  dropped,  and  the  vector  r  must  be  taken  as  specified  in  (41),  with  c=N,  as 

a. 

determined  by  (259).  Doing  this  lifts  the  shield  against  spurious  states,  but  Theorem  13 
assures  that  the  final  state  is  the  dominant  Hadamard  vector,  as  desired,  provided  that  the 
coupling  constant  is  as  specified  and  the  gain  is  large  enough. 

This  result  can  be  understood  from  the  features  of  the  energy  landscape.  The 
external  coupling  tilts  the  energy  surface  just  a  little,  if  the  coupling  constant  /x  is  small. 
The  tilt  also  is  necessary  for  the  state  to  move  away  from  the  starting  point,  the  origin.  For 
very  large  values  of  /x  the  tilt  is  so  severe  as  to  destroy  the  stabilty  of  the  Hadamard  points. 
For  a  range  of  intermediate  values  of  /z  one  expects  proper  behavior  of  the  DLS.  It  would  be 
valuable  to  determine  this  range  from  theory,  since  for  the  practical  dimensions  N>32  a 
complete  check  of  associative  recall  by  numerical  computation  is  out  of  the  question 
because  of  the  computer  time  required. 

Theorem  3  assures  the  instability  of  any  stationary  signal  which  has  components  y 

cl 

which  not  all  have  values  near  ±1.  An  example  is  given  by  signals  with  a  principal 
spectrum  of  even  order  m.  Such  a  signal  has  components  y  that  are  zero,  because  of  the 

nature  of  Hadamard  vectors.  For  m=2  the  spectrum  is  conserved  in  the  proportional 
region,  by  Theorem  12,  and  also  further  out,  beyond  the  proportional  region,  because  the 
signal  trajectory  is  a  straight  line  through  the  origin  in  this  case.  By  Theorems  10  and  11, 
there  is  a  stationary  point  with  this  spectrum,  if  r=0.  By  Theorem  3,  this  stationary  point 
is  unstable. 

Some  of  the  theorems  specify  the  hyperbolic  tangent  threshold  function,  while  others 
use  the  piecewise  linear  function.  These  choices  have  been  made  to  expedite  the  proofs.  It  is 
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is  expected  that  the  results  obtained  go  through  for  a  large  class  of  sigmoid  functions. 

In  regard  to  the  choice  between  initial  value  coupling  and  external  coupling,  it  is 
noted  that  from  a  theoretical  point  of  view  the  initial  value  coupling  is  cleaner,  because  it 
does  not  tilt  the  energy  surface.  Then,  all  Hadamard  points  are  on  the  same  footing  in 
regard  to  stabilizing  forces.  Since  for  zero  threshold  we  have  the  stable  point  at  the  origin 
(by  Theorem  6),  the  coupling  constant  /x  must  be  chosen  large  enough  to  place  the  initial 
signal  outside  of  the  crater.  Increasing  /x  means  placing  the  initial  signal  farther  out,  and 
hence  has  the  result  of  speeding  up  the  DLS  action.  However,  increasing  /x  also  leaves  less 
time  for  spectral  purification,  and  we  may  expect  a  breakdown  in  associative  recall  for  very 
large  coupling  constants.  From  the  practical  standpoint,  the  external  coupling  is  preferred, 
because  it  provides  for  a  separate  input  and  output  of  the  DLS.  In  the  presence  of  noise, 
the  external  coupling  has  the  advantage  that  the  BLT  output  u  remains  standing  on  the 
input,  whereas  the  original  information  u  is  lost  soon  after  reset  in  the  initial  value 
coupling  scheme 

It  may  be  surprising  that  so  much  theory  can  be  developed  about  a  neural  net  with 
nonlinear  activation;  the  dynamics  in  the  continuum  model  is  governed  by  N  coupled 
nonlinear  differential  equations.  Two  kinds  of  nonlinearities  are  present:  the  familiar 
sigmoid  nonlinearity  in  the  neuron  output  function,  and  the  quadratic  activation,  involving 
the  Hadamard  business.  The  reason  for  the  possibility  of  extensive  theoretical  development 
is,  of  course,  that  the  stored  vectors  in  the  DLS  are  Hadamard  vectors.  The  properties  of 
these  vectors,  orthonormality  and  others,  allow  deductions  and  calculations  that  would  not 
be  possible  in  more  general  or  different  settings. 

The  DLS  may  be  seen  as  a  distributed  winner  take  all  circuit.  As  such,  it 
may  have  applications  beyond  the  one  described  here,  as  second  stage  of  an  SRM. 

The  Hadamard  matrices  constructed  from  cyclic  S  matrices  can  be  generated  by 
shifts  of  the  vector  z^_^  (see  Appendix  A).  This  provides  a  method  of  constructing 

Hadamard  vectors  in  hardware,  but  for  large  dimension  N  the  implementation  of  the  vector 
ZN— l  becomes  cumbersome.  Then,  one  may  consider  using  random  bipolar  vectors  instead 

of  Hadamard  vectors.  For  large  dimension,  random  bipolar  vectors  have  a  high  probability 
of  being  nearly  orthogonal,  and  we  may  expect  most  of  the  theory  to  go  through  "on  the 
average".  Alternatively,  one  can  start  with  an  index  group  defined  by  a  structure  function 
which  is  invariant  under  permutations,  and  use  this  group  instead  of  the  Hadamard  group. 
The  structure  function  may  be  chosen  at  random,  and  it  would  commit  the  connection 
tensor  Sa^c  in  the  equations  of  motion  (15).  The  symmetry  of  S  ^  is  then  a  consequence 

of  the  invariance  of  the  group  structure  under  index  permutations.  We  see  a  simple  way  of 

generating  permutation  invariant  structure  functions  by  use  of  XOR. 

The  bipolar  vectors  that  would  replace  the  Hadamard  vectors  in  this  scheme  need  to 
be  computed  by  the  neural  net  because  they  must  serve  as  labels  for  the  stored  states  in  the 
BLT.  But  if  everything  goes  as  expected,  such  states  arise  naturally  as  settled  states  of  the 
DLS,  hooked  up  according  to  the  group  structure.  By  Hebb  learning  these  states  can  be 
impressed,  together  with  the  stored  states  (as  outer  products)  upon  the  BLT  connection 
matrix  B. 


APPENDIX  A 
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In  this  appendix  a  discussion  is  given  of  Hadamard  matrices  and  Hadamard  vectors, 
insofar  as  needed  in  this  report.  Hadamard  vectors  are  bipolar  vectors  that  form  an 
orthonormal  set  The  norm  of  the  vectors  is  not  unity,  but  JN.  where  N  is  the  dimension, 
here  restricted  to  be  a  power  of  2.  A  square  matrix,  the  rows  of  which  are  Hadamard 
vectors,  is  called  a  Hadamard  matrix.  Hadamard  matrices  have  important  application  in 
signal  processing  involving  multiplexing;  they  are  used  in  spectrometers  and  imagers  to 
improve  the  signal  to  noise  ratio.  Optics  employing  this  method  is  called  "Hadamard 
transform  optics"  [13].  Hadamard  matrices  are  also  used  in  error— correcting  codes  [13]. 
They  are  used  here  simply  because  in  our  SRM  there  is  a  need  for  orthogonal  labels  that 
are  bipolar  vectors. 

The  Hadamard  vectors  are  denoted  by  h^,  a=l  to  N.  Labeling  the  vector 
components  by  the  index  b=l  to  N,  the  Hadamard  matrix  with  rows  h  has  the  elements 
h  The  orthonormality  of  the  Hadamard  vectors  is  expressed  by 


h 


a  h b 
a  a 


(Al) 


where  <5 ^  is  the  Kronecker.  From  (Al)  and  the  linear  independence  of  the  Hadamard 
vectors  a  second  set  of  orthonormality  conditions  can  be  derived: 


h 


(A2) 


In  this  report  we  restrict  the  Hadamard  matrices  to  be  symmetric,  and  to  have 
solely  +  in  their  first  row.  For  dimensions  N  that  are  powers  of  2  there  are  the 
Sylvester— type  Hadamard  matrices  [13],  which  are  defined  recusively  by  the  scheme 


HN  hn 
hn  -hn 


+  + 
+  - 


(A3) 


For  example,  The  Sylvester— type  Hadamard  matrix  of  dimension  16  is  shown  below 

++++++++++++++++ 

H - 1-  — I - 1 H - 1 - 1 - b  — 

++--++--++--++-- 

+--++--++--++--+ 

+  +  +  + - +  +  +  + - 

H - I - I - b  H - I - I - b 

+  + - +  +  +  + - +  + 

H - 1 - b  H —  H - 1 - b  H — 

++++++++ - 

+-+-+-+ - +  -  +  -  +  -  + 

-f-j - +  H - +  H - b  + 

+--++--+-++--++- 

+  +  +  + - +  +  +  + 

+-+--+-+-+-++-+- 

+  + - +  +  __+  +  +  +  __ 

+ - 1 - h  H - b  + - 1 - + 
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There  is  a  second  type  of  Hadamard  matrix.  These  matrices  are  constructed  from 
so-called  cyclic  S  matrices  [13],  and  they  exist  only  for  certain  dimensions  N,  including  all 
powers  of  2.  The  construction  of  a  matrix  of  this  type  starts  with  choosing  the  first 
Hadamard  vectors  to  have  all  components  +.  The  remaining  N— 1  Hadamard  vectors  are 
found  by  taking  their  first  component  as  +,  and  the  remaining  N— 1  components  as  left 
shifts  of  a  special  N— 1  dimensional  bipolar  vector  z^_p  the  construction  of  which  is 

discussed  by  Harwit  and  Sloane  [13].  They  also  show  a  list  of  sue!,  vectors  for  several  small 
values  of  N.  Examples  taken  from  [13]  are: 


3  -  +  - 

7  - +  -+  + 

11  --  + - +  +  +-  + 

15  +  +  -F-+  +  --  +  -  + - 

The  Hadamard  matrix  constructed  from  z  for  N  =  16  is 


++++++++++++++++ 

++++-++--+-+ - 

+  +  +  -+  +  --+-  + - + 

+  -1 —  +  H - 1 - H - 1-  + 

H —  +  *1 - ! —  H - h  4-  + 

+  +  "I - h - 1 - h  +  -I - 

+  +--+-  + - +  +  +-  + 

+  --  +  -  + - +  +  +-  +  + 

+  -  +  -+ - +  +  +  -+  +- 

+  +-  + - +  +  +-  ++ - 

+  -+ - +  +  +-  ++  --  + 

+  + - +  +  +-+ 

+ - +  +  +  -+  +  --  +  -  + 

+ - +  +  +-  +  +  --+-  +  - 

+--+++-++--+-+-- 
+  -+  +  +-  ++  --  +  -  + - 


In  this  report  we  use  the  Hadamard  matrices  constructed  from  cyclic  S  matrices. 
The  dimension  N  is  restricted  to  a  power  of  2.  For  these  Hadamard  vectors  h  we  have 

8  haa  =  N«al  '  “  ^ 

by  the  symmetry  of  Hadamard  matrices  used  here,  this  may  also  be  written  as 

8haa=N<al-  (A5> 

We  further  have 

hla=l,  Va  ,  and  hal=l,  Va  .  (A6) 


The  Hadamard  transform  p  of  a  vector  p  is  given  by 

Of  3, 
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Pa=  (l#)haaPa  .  (A7) 

The  same  transformation  is  used  for  contravariant  vectors: 

p“=  (1A/N)h“apa  .  (A8) 

Indices  are  raised  and  lowered  with  the  Kronecker  6  as  metric  tensor.  The  Hadamard 
transform  is  an  orthogonal  transformation,  i.e.,  it  preserves  scalar  products.  As  a  result, 
one  has 

y,/=yaya  •  (as) 

The  inverse  Hadamard  transforms  are: 

Pa=(l/yN)h“aPa  ,  (A10) 

pa=(l/,/N)haV  ■  (AH) 

For  any  vector  p,  the  p  may  be  seen  as  the  components  in  the  neuron  frame, 

a. 

whereas  the  pft  are  the  components  of  the  vector  in  the  Hadamard  frame. 

In  the  sequel  we  need  the  group  property  of  the  Hadamard  vectors.  This  property  is 
discussed  in  detail  in  [9],  and  we  only  give  here  the  results  without  proof.  We  have 

Theorem  Al:  For  dimensions  that  are  a  power  of  2,  the  Hadamard  vectors  constructed 
from  cyclic  S  matrices  form  a  group  under  component  wise  multiplication. 

This  means  that 


^aa^/3a  ^7a  ’  ^  a 


(A12) 


where  7=f(a,/?)  ;  (A13) 

f  is  called  the  structure  Junction  of  the  group.  The  component  wise  multiplication  is  a 

logical  XOR,  so  that  (A13)  may  also  be  expresed  as 

haXUHh^=h7.  (A14) 

The  stucture  function  can  be  determined  from  the  Hadamard  vectors.  For  example,  we 
have,  for  N=16,  f(2,3)=6,  and  f(2,4)= 10,  as  can  be  seen  from  the  Hadamard  matrix  on 
page  53.  The  first  Hadamard  vector,  hp  has  all  components  +  and  therefore  acts  as  the 

identity.  It  is  easy  to  see  that  every  Hadamard  vector  is  its  own  inverse, 


h  h  —  h ,  ,  Va,  Va  . 
aa  aa  la  ’  ’ 


(A15) 


For  the  structure  function,  (A15)  implies 

f(a,a)  =  1  ,  V  a  . 


(A16) 
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Moreover,  f(a,/3)  =1  =  =  >  a=0 ,  (A17) 

as  can  be  seen  as  follows.  ^aah^a=hla  ,V  a  implies 

^aa^/?a^/?a-^la^/?a  ' 

Because  of  the  symmetry  of  the  Hadamard  matrices  used,  the  roles  of  indices  a  and  a  may 
be  reversed,  so  that  we  also  have 


aa  ab—  ac  ’ 

(A19) 

where 

c=f(a,b)  . 

(A20) 

The  function  f  is  symmetric, 

f(a,b)=f(b,a), 

(A21) 

and  (A20)  is  invariant  under  cyclic  permutation  of  indices, 

so  that  (A20)  implies 

b=f(c,a)  . 

(A22) 
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If  the  connection  tensor  is  fully  symmetric,  the  velocity  vector  v  given  by  (15)  is 
curlfree  in  the  space  of  signals  y,  but  not  in  activation  space.  This  is  shown  as  follows. 

Considering  va  as  a  function  of  y,  we  have  from  (15) 


-{da/s'(vd)+iad/s'(va)+(Sadb+Sabd)j,b-(Sdab+Sdba)>rb  =  0  > 


(Bl) 


by  the  symmetry  of  the  connection  tensor,  and  the  property  of  the  Kronecker  5.  However, 

if  v  is  considered  to  be  a  function  of  v  we  have 
a 

*a/avd-^d/ava=-{da+,Sad+(Sadb+Sabd)ybs'<vdHSdab+Sdba)ybs'(va),  (B2) 
and  this  does  not  vanish  if  s'(v  ,)^s'(v  ).  This  can  be  remedied  by  multiplying  v  by 

U  d  a 

s' (v  );  one  has 

a 

3<V'(va))/dvd-a(ids’(vd))/i)va=ia«daS-'(va)-id«ads--(vd)-s'(,a)«da+s-(vd)«ad 

+(Sadb+Sabd)ybs'<va)s'(vdHSdab+Sdba)J,bs'(vd)s'(va)  (B3> 

The  first  two  terms  cancel  each  other  because  of  the  Kronecker  delta.  Similarly,  the  third 
and  fourth  terms  cancel.  The  S  terms  now  cancel  because  each  has  the  same  factor 
s,(va)s'(vd)i  and)  of  course,  the  symmetry  is  needed  as  well.  It  follows  that  the  vector  field 

vas'(va)  is  curlfree  in  activation  space.  If  we  follow  this  up  and  take  the  path  integral  of 
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v  s'(v)  in  activation  space  we  get  the  same  Liapunov  function  as  before: 

eL  a' 

~${  V'(va)dva=-{7',adJ'a'  <B4> 

by  a  change  of  integration  variables.  On  the  right  the  summation  convention  is  used;  we 
could  not  do  that  on  the  left  because  there  are  too  many  indices  a. 


,  o 

The  discussion  clarifies  the  somewhat  unnatural  construction  —  ^  v&dy  for  the 

Liapunov  function  (20),  which  is  a  path  integral  of  the  activation  velocity  vector  in  signal 
space. 


APPENDIX  C 

In  this  appendix  we  derive  some  properties  and  consequences  of  the  neuron  output 
function  given  by  (18), 

y=s(v)  =  tanh(gv)  ,  (Cl) 

where  g  is  the  gain  at  zero.  The  inverse  mapping,  from  y  to  v,  is 

v  =  2g  ln  I^y  •  (C2) 

One  has 

s'(v)  =  g  sech2(gv)  =  g(l-s2)  ,  (C3) 

The  critical  activation  v*  was  introduced  in  the  sequel  for  estimation  purposes;  its  value  is 
chosen  such  that 

s(v*)  =  1— e ,  (C4) 

where  e  is  a  small  positive  number.  For  the  hyperbolic  tangent  (Cl)  one  has  from  (C2)  for 
small  e 

v*-  jg  In  (2/f) ,  (C5) 

v*<^ln(2/<),  (C6) 

and 

s'(v*)  =  ge(2-e)  ~  2ge  .  (C7) 

In  the  proportional  region,  the  signal  is  proportional  to  the  activation,  either 
approximately  or  excactly.  For  the  hyperbolic  tangent  output  function,  we  need  to  know 
the  accuracy  of  the  linear  approximation.  This  accuracy  may  be  calculated  as  the  ratio  of 
the  cubic  term  to  the  linear  term  in  the  power  series  expansion  of  the  hyperbolic  tangent: 

y=tangh(gv)=gv— g3v3/3+  ..  (C8) 

The  ratio  of  the  cubic  term  to  the  linear  term  is 
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g2v2/3=y2/3.  (C9). 

From  (24)  one  has 

v  v 

V<v)  =  vs-|s(0  d£  =  v  tanh(gv)-|tangh(g£)  d£  = 

=  v  tanh(gv)  -  (1/g)  In  cosh(gv)  .  (CIO) 

If  the  energy  E  of  (22)  is  considered  a  function  in  signal  space,  ip  needs  to  be  expressed  in 
terms  of  y.  This  can  be  done  by  using  (C2)  in  (CIO): 

<P  =  2£  ln  T=^  +  ln  (1/cosh2(gv))  = 

=  “5g  {ln  T=y  +  ln  (1_  tanh2(gv))}  =  {In  4-  ln  (1-y2)}  = 

=  2g{(i+y)ln(l+y)  +  (l-y)ln(l-y)}.  (Cll) 

Hence,  for  the  function  fi  in  the  energy  (22)  we  have 

Q  =  jg  J  {(!+ya)  ln(i+ya)+(i-ya)  in(i-ya)}  •  (C12) 

In  the  sequel  we  need  a  polynomial  approximation  of  ft  near  the  origin.  Such  an 

9 

approximation  good  to  powers  y  is 

ct 

S!-R2/(2g)  .  (C13) 

where 

R2=yaya  -  (Ci4) 

For  the  piecewise  linear  output  function 

s(v)=-l  if  v<— 1/g  , 

=gv  if  | v{ <l/g  ,  (C15) 

=  1  if  v>l/g  , 

the  proportional  region  is  given  by 

I  v|  <l/g  (C16) 

In  the  proportional  region  we  have  for  the  function  fi 

n-R2/(2g)  ;  (C17) 

this  can  be  verified  by  taking  partial  derivatives: 

dft/dya=ya/g=va  ,  (C18) 

as  required. 
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APPENDIX  D 

For  convenience  we  show  here  the  proof  of  Lemma  1,  which  is  just  Theorem  2  of  [9], 
The  proof  is  formally  altered  at  some  places  because  of  a  different  factor  in  the  Hadamard 
transform. 

It  should  be  noted  that  in  this  appendix  the  signal  y  is  restricted  to  be  bipolar,  i.e., 
y  is  a  corner  point  of  the  solid  hypercube  . 


The  material  considered  involves  the  vector  Q  defined  by  (54) 

a 


Q  =  N  E  h  v  2  . 
xa  a  aaJ  a 

where 

(Dl) 

ya  =  (i  A/N)  i>aaya 

is  the  Hadamard  transform  of  the  signals  y  . 

a> 

(D2) 

Before  stating  and  proving  Lemma  1,  we  need  some  preparation. 

set  A: 

Define  the  index 

A=Wya*°} . 

and  the  disjoint  pieces 

(D3) 

A+={a|a€A&haa=+}  . 
and 

(D4) 

A~={a|aeA&haa=-}. 

Using  this  decomposition,  (Dl)  may  be  written 

(D5) 

Q-/N=  S  ,y„2-  E  _ya2  • 

a  aeA+  Q  aeA  a 

a  a 

From  (A9)  and  the  fact  that  the  components  y„  are  ±1,  one  has 

a 

(D6) 

gya2  =  N' 

this  may  be  written  as 

(D7) 

N=  £  y  2  +  £  _  y  2  . 
oeA^  a  aeA^ 

(D6)  and  (D8)  give 

(D8) 

2  E  +  y„2  =  n  +  Q  /n  , 

q€A+  Q  a 

a 

and 

(D9) 

2  E  _  y q  =  N  —  Q./N  ; 

«A„  °  a 

(DIO) 

since  the  left  hand  sides  are  nonnegative,  this  implies 
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-N2  <  Q  <  N2  .  (Dll) 

a 

We  show 

o 

Lemma  Dl:  yflv  ,  a£l,  Q  =— N  <  =  =  >  h  =—,  V  aeA  . 

IN  a,  (Xa 

Proof:  To  show  <===  ,  we  note  that  h  =— ,  V  aeA,  implies  that  a^l,  by  (A6),  and 

UCcL 

VN5Va2  =  -N5»<.!  =  -N2'  <D12> 

by  (D7), 

2  4- 

To  show  ==>  ,  we  note  that  Q  =— N  and  (D9)  imply  that  the  set  A^  is  empty;  i.e., 
haa=-  ,  V  «A  .  j 

After  this  preparation  we  proceed  with 
2 

Lemma  1:  ytl^,  Qb=— N  ,  Vb  such  that  yb=—  <  ===>  y  is  Hadamard  t  . 


Proof:  Define  the  index  set 


B={b|yb=-}  • 

Then  the  Hadamard  transform  (D2)  may  be  written  as 

(D13) 

ya=(lA/N){-b?Bhab+b?BhaV 

The  property  (A5)  of  the  Hadamard  vectors  used  here  implies  that 

(D14) 

bh  "a  =  N  'al-blB  hab  ■ 
so  that  (D14)  may  be  written 

(D15) 

'^a=N{al-2b?Bhab' 

Suppose  Qb=— N2,  V  beB.  Then,  Lemma  Dl  and  (D16)  give 

(D16) 

a  €  A  ,  v/N  ya=N<5al  +2(N-W)/2  =  N6q1  +  N-W  , 

where 

(Dl  7) 

w  =  Jv 

(D1S) 

and  (N— W)/2  is  the  cardinality  of  the  set  B.  Hence  we  have 


Lemma  D2:  Qb  =  -N2,VbeB  ==  =  >  VN  y  =  Ntf  L +N-W  ,  Vq€A  . 


60 


There  appear  to  be  two  cases,  Case  1:  a=l  €A,  and  Case  2:  a—\  . 

Case  1:  a=l  €A.  Calculating  y^  of  Lemma  D2,  we  have 

v/N  y1=2N-W  . 

Also,  from  (D2),  (D18),  and  the  fact  that  has  all  +  components  we  have 

v/NypW  . 

From  (D19)  and  (D20)  it  follows  that 

W  =  N  , 


(D19) 

(D20) 
(D21 ) 


so  that  y  =  ,  the  all  positive  Hadamard  vector.  However,  for  y  =  (Dl)  gives 

2  2 

=  N  ,  in  contradiction  with  =  — N  asssumed  in  Lemma  D2.  Hence,  Case  1  is  not 
possible  within  the  premises  of  Lemma  D2. 

Case  2:  a=l  £A  .  From  (D2)  for  a=l  and  (D18)  it  follows  that  W=0.  Hence,  for  ctf  1  eA 
the  ya  of  Lemma  D2  is  just  >/N.  With  (A9)  it  follows  that 

N  =  &yo2=a?A>a2  =  rN'  <D22> 

where  r  is  the  cardinality  of  set  A.  Since  (D12)  implies  r=l,  the  set  A  contains  only  a  single 
element,  say,  7.  It  follows  that  y  =  ,  and  we  may  conclude 

Q^=— N2  ,  V  aeB  ==  =  >  y  is  Hadamard  t  ,  (D23) 

which  is  the  forward  part  of  Lemma  2. 

The  converse  is  also  true,  since  for  y=h^  ,  7^1,  we  have  ya=0,  V  at 7,  and  y  =^N, 
so  that  (Dl)  gives 

Qb=  N2h7b  (D24) 

o 

For  index  b  such  that  y^=h^  =—  it  follows  that  Q^=—  N  .  J 
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BLT  DLS 


-o-  =  t  hresho  I  d  i  ng 


Fig  1  Selective  Reflexive  Memory  (SRM),  consisting  of 
a  Bidirectional  Linear  Transformer  (BLT)  and  a  Dominant 
Label  Selector  (DLS)  The  mput  vector  x  is  bipolar  The 
BLT  performs  a  linear  t:  ansformation  u=Bx.  The  BLT 
output  u  is  presented  to  the  uL'S  ,  which  selects  from  u 
the  Hadamard  vector  y  =  h,  that  cccurs  with  the  largest 
coefficient  in  the  Hadamard  expansion  of  u  The  selected 
Hadamard  vector  is  returned  to  the  BLT  ,  and  is  pro¬ 
cessed  in  the  BLT  backstroke  to  give  w=hflB  w  is  thres- 

P 

holded  to  give  x’.  Things  can  be  arranged  such  that  x’is 
the  stored  vector  nearest,  the  input  x 


